Supervised Learning Classification Project: AllLife Bank Personal Loan Campaign¶

Problem Statement¶

Context¶

AllLife Bank is a US bank that has a growing customer base. The majority of these customers are liability customers (depositors) with varying sizes of deposits. The number of customers who are also borrowers (asset customers) is quite small, and the bank is interested in expanding this base rapidly to bring in more loan business and in the process, earn more through the interest on loans. In particular, the management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors).

A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio.

You as a Data scientist at AllLife bank have to build a model that will help the marketing department to identify the potential customers who have a higher probability of purchasing the loan.

Objective¶

To predict whether a liability customer will buy personal loans, to understand which customer attributes are most significant in driving purchases, and identify which segment of customers to target more.

Data Dictionary¶

  • ID: Customer ID
  • Age: Customer’s age in completed years
  • Experience: #years of professional experience
  • Income: Annual income of the customer (in thousand dollars)
  • ZIP Code: Home Address ZIP code.
  • Family: the Family size of the customer
  • CCAvg: Average spending on credit cards per month (in thousand dollars)
  • Education: Education Level. 1: Undergrad; 2: Graduate;3: Advanced/Professional
  • Mortgage: Value of house mortgage if any. (in thousand dollars)
  • Personal_Loan: Did this customer accept the personal loan offered in the last campaign? (0: No, 1: Yes)
  • Securities_Account: Does the customer have securities account with the bank? (0: No, 1: Yes)
  • CD_Account: Does the customer have a certificate of deposit (CD) account with the bank? (0: No, 1: Yes)
  • Online: Do customers use internet banking facilities? (0: No, 1: Yes)
  • CreditCard: Does the customer use a credit card issued by any other Bank (excluding All life Bank)? (0: No, 1: Yes)

Importing necessary libraries¶

In [ ]:
# Library to suppress warnings or deprecation notes
import warnings

warnings.filterwarnings("ignore")

# Libraries to help with reading and manipulating data

import pandas as pd
import numpy as np

# Library to split data
from sklearn.model_selection import train_test_split

# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)

#Libraies to builde the logistic regression model
from sklearn.linear_model import LogisticRegression

# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

# To tune different models
from sklearn.model_selection import GridSearchCV

# To perform statistical analysis
import scipy.stats as stats

#SFS
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs

# To get different metric scores
from sklearn.metrics import (
    f1_score,
    accuracy_score,
    recall_score,
    precision_score,
    confusion_matrix,
    ConfusionMatrixDisplay,
    make_scorer,
)
In [ ]:
pip install uszipcode==1.0.1 'sqlalchemy-mate<2'
Collecting uszipcode==1.0.1
  Downloading uszipcode-1.0.1-py2.py3-none-any.whl.metadata (8.9 kB)
Collecting sqlalchemy-mate<2
  Downloading sqlalchemy_mate-1.4.28.4-py2.py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: attrs in /usr/local/lib/python3.12/dist-packages (from uszipcode==1.0.1) (25.4.0)
Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from uszipcode==1.0.1) (2.32.4)
Collecting pathlib-mate (from uszipcode==1.0.1)
  Downloading pathlib_mate-1.3.2-py3-none-any.whl.metadata (8.4 kB)
Collecting atomicwrites (from uszipcode==1.0.1)
  Downloading atomicwrites-1.4.1.tar.gz (14 kB)
  Preparing metadata (setup.py) ... done
Collecting fuzzywuzzy (from uszipcode==1.0.1)
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl.metadata (4.9 kB)
Collecting haversine>=2.5.0 (from uszipcode==1.0.1)
  Downloading haversine-2.9.0-py2.py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: SQLAlchemy>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from uszipcode==1.0.1) (2.0.44)
Collecting SQLAlchemy>=1.4.0 (from uszipcode==1.0.1)
  Downloading SQLAlchemy-1.4.54-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Requirement already satisfied: prettytable in /usr/local/lib/python3.12/dist-packages (from sqlalchemy-mate<2) (3.16.0)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.12/dist-packages (from SQLAlchemy>=1.4.0->uszipcode==1.0.1) (3.2.4)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.12/dist-packages (from prettytable->sqlalchemy-mate<2) (0.2.14)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->uszipcode==1.0.1) (3.4.4)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->uszipcode==1.0.1) (3.11)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->uszipcode==1.0.1) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->uszipcode==1.0.1) (2025.10.5)
Downloading uszipcode-1.0.1-py2.py3-none-any.whl (35 kB)
Downloading sqlalchemy_mate-1.4.28.4-py2.py3-none-any.whl (77 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.1/77.1 kB 6.9 MB/s eta 0:00:00
Downloading haversine-2.9.0-py2.py3-none-any.whl (7.7 kB)
Downloading SQLAlchemy-1.4.54-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 61.1 MB/s eta 0:00:00
Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Downloading pathlib_mate-1.3.2-py3-none-any.whl (56 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.0/57.0 kB 5.4 MB/s eta 0:00:00
Building wheels for collected packages: atomicwrites
  Building wheel for atomicwrites (setup.py) ... done
  Created wheel for atomicwrites: filename=atomicwrites-1.4.1-py2.py3-none-any.whl size=6943 sha256=7ba518d744d6ae2d83ef59ad0e59d1c49d7ee6dffebd8a867cdcf93becdf8912
  Stored in directory: /root/.cache/pip/wheels/6b/37/a4/ae30755673c2d1e07228f13b4491fcaef62438f771d5012d07
Successfully built atomicwrites
Installing collected packages: fuzzywuzzy, SQLAlchemy, pathlib-mate, haversine, atomicwrites, sqlalchemy-mate, uszipcode
  Attempting uninstall: SQLAlchemy
    Found existing installation: SQLAlchemy 2.0.44
    Uninstalling SQLAlchemy-2.0.44:
      Successfully uninstalled SQLAlchemy-2.0.44
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-adk 1.17.0 requires sqlalchemy<3.0.0,>=2.0, but you have sqlalchemy 1.4.54 which is incompatible.
ipython-sql 0.5.0 requires sqlalchemy>=2.0, but you have sqlalchemy 1.4.54 which is incompatible.
Successfully installed SQLAlchemy-1.4.54 atomicwrites-1.4.1 fuzzywuzzy-0.18.0 haversine-2.9.0 pathlib-mate-1.3.2 sqlalchemy-mate-1.4.28.4 uszipcode-1.0.1
In [ ]:
!apt-get install uszipcode
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package uszipcode

Loading the dataset¶

In [ ]:
#Loading the data drive from google drive
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [ ]:
#Loading data
data = pd.read_csv ('/content/drive/My Drive/UTA - AIML/Loan_Modelling.csv')
In [ ]:
# copying data to another variable to avoid any changes to original data
loan = data.copy()

Data Overview¶

  • Observations
  • Sanity checks
In [ ]:
# Display 10 random sample rows of the dataset
loan.sample(10)
Out[ ]:
ID Age Experience Income ZIPCode Family CCAvg Education Mortgage Personal_Loan Securities_Account CD_Account Online CreditCard
2952 2953 33 8 182 94065 1 8.6 1 0 0 0 0 0 0
1085 1086 51 26 11 92612 2 0.0 1 0 0 1 0 0 0
3854 3855 31 6 83 94720 4 1.8 3 0 0 0 0 1 0
4954 4955 45 19 22 94904 3 1.5 1 0 0 0 0 1 1
4950 4951 47 23 19 90089 1 1.0 1 0 0 0 0 0 1
1959 1960 50 24 130 95833 1 1.0 1 0 0 0 0 0 0
3738 3739 54 28 45 95008 3 1.4 1 0 0 0 0 0 1
1905 1906 25 -1 112 92507 2 2.0 1 241 0 0 0 1 0
4143 4144 55 31 20 94720 2 0.3 1 0 0 0 0 1 0
1543 1544 52 26 101 93407 2 2.4 2 0 0 0 0 1 0
In [ ]:
# we will drop the ID Columns as it does not add any predictibility value
loan.drop("ID", axis=1, inplace=True)
In [ ]:
# Display 5 random sample rows of the dataset to check dropping of ID column
loan.sample(5)
Out[ ]:
Age Experience Income ZIPCode Family CCAvg Education Mortgage Personal_Loan Securities_Account CD_Account Online CreditCard
4989 24 0 38 93555 1 1.0 3 0 0 0 0 1 0
3653 52 27 32 92521 2 2.0 2 0 0 0 0 0 1
2579 52 27 23 92780 1 0.4 3 0 0 0 0 0 0
1544 39 15 24 92123 1 1.0 1 116 0 0 0 1 1
2681 37 11 35 94609 2 0.8 3 0 0 0 0 0 0
In [ ]:
# viewing the shape of the data set
loan.shape
Out[ ]:
(5000, 13)
In [ ]:
#viewing the dataset attributes
loan.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Age                 5000 non-null   int64  
 1   Experience          5000 non-null   int64  
 2   Income              5000 non-null   int64  
 3   ZIPCode             5000 non-null   int64  
 4   Family              5000 non-null   int64  
 5   CCAvg               5000 non-null   float64
 6   Education           5000 non-null   int64  
 7   Mortgage            5000 non-null   int64  
 8   Personal_Loan       5000 non-null   int64  
 9   Securities_Account  5000 non-null   int64  
 10  CD_Account          5000 non-null   int64  
 11  Online              5000 non-null   int64  
 12  CreditCard          5000 non-null   int64  
dtypes: float64(1), int64(12)
memory usage: 507.9 KB

All variable types are int except CCAvg is float, hence let us display the 5-point summary of the data

In [ ]:
#Checking for null and duplicate values
loan.isnull().sum()
Out[ ]:
0
Age 0
Experience 0
Income 0
ZIPCode 0
Family 0
CCAvg 0
Education 0
Mortgage 0
Personal_Loan 0
Securities_Account 0
CD_Account 0
Online 0
CreditCard 0

No Null values in the dataset

In [ ]:
#checking for duplicate values
loan.duplicated().sum()
Out[ ]:
np.int64(0)

No duplicated values in the dataset

In [ ]:
# viewing the variables datatypes
loan.dtypes
Out[ ]:
0
Age int64
Experience int64
Income int64
ZIPCode int64
Family int64
CCAvg float64
Education int64
Mortgage int64
Personal_Loan int64
Securities_Account int64
CD_Account int64
Online int64
CreditCard int64

In [ ]:
#checking for 5 point summary
loan.describe().T
Out[ ]:
count mean std min 25% 50% 75% max
Age 5000.0 45.338400 11.463166 23.0 35.0 45.0 55.0 67.0
Experience 5000.0 20.104600 11.467954 -3.0 10.0 20.0 30.0 43.0
Income 5000.0 73.774200 46.033729 8.0 39.0 64.0 98.0 224.0
ZIPCode 5000.0 93169.257000 1759.455086 90005.0 91911.0 93437.0 94608.0 96651.0
Family 5000.0 2.396400 1.147663 1.0 1.0 2.0 3.0 4.0
CCAvg 5000.0 1.937938 1.747659 0.0 0.7 1.5 2.5 10.0
Education 5000.0 1.881000 0.839869 1.0 1.0 2.0 3.0 3.0
Mortgage 5000.0 56.498800 101.713802 0.0 0.0 0.0 101.0 635.0
Personal_Loan 5000.0 0.096000 0.294621 0.0 0.0 0.0 0.0 1.0
Securities_Account 5000.0 0.104400 0.305809 0.0 0.0 0.0 0.0 1.0
CD_Account 5000.0 0.060400 0.238250 0.0 0.0 0.0 0.0 1.0
Online 5000.0 0.596800 0.490589 0.0 0.0 1.0 1.0 1.0
CreditCard 5000.0 0.294000 0.455637 0.0 0.0 0.0 1.0 1.0

Observation:

  1. The min Age is 23 , max is 67 , Average is 45 Years old - Data seems reasonable and normaly distributed
  2. The min Experience is -3 years which does not seem reasonable, hence requires a closer look
  3. The min Income is USD 46k, max is USD 224K and average is USD 64K - Dataset seems to be right skewed
  4. The Zip codes shall not be treated as an integer value as it reflects location - Requires data pre-processing
  5. The min Family size is 1, max is 4 and average is 2 - Dataset seems reasonable and almost uniformly distributed and can be treated as a categorical variable as it holds only 4 values
  6. The min CCAvg is USD 0k (which can be reflecting customers who do not own credit cards), max is USD 1.9K and average is USD 10K - Dataset seems reasonable and right skewed
  7. The Education is a categorical variable where 1: Undergrad; 2: Graduate;3: Advanced/Professional
  8. The min Mortgage is USD 0k, max is USD 635K and average is USD 0K - Dataset heavily right skewed
  9. The Personal_Loan is a categorical variable where 1: customer accepted the personal loan offered in the last campaign and 0:customer didnot accept
  10. The Securities_Account is a categorical variable where 1: customer has Securities_Account 0:customer does not have Securities_Account
  11. The CD_Account is a categorical variable where 1: customer has CD_Account 0:customer does not have CD_Account
  12. The Online is a categorical variable where 1: customer uses online banking 0:customer does use online banking
  13. The CreditCard is a categorical variable where 1: customer use a credit card issued by any other Bank 0:customer does not use a credit card issued by any other Bank

Data Preprocessing¶

  • Missing value treatment
  • Feature engineering (if needed)
  • Outlier detection and treatment (if needed)
  • Preparing data for modeling
  • Any other preprocessing steps (if needed)

1. Experience column (Treating the -ve value rows)

In [ ]:
# displaying how many rows carries a negative value
print(f'There are {len(loan[loan["Experience"] < 0])} rows with a negative value')
There are 52 rows with a negative value
In [ ]:
# let us plot the distribution of the Experience variable and view the skeweness (if any) to decide on the best amputation approach
sns.displot(x=loan["Experience"], kde=True)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x7d0ed985e780>
No description has been provided for this image

The data distribution is roughly taking the shape of a uniformly distributed curve. Accordingly, the approach to amputate the -ve values is by setting them equal to the median

In [ ]:
#applying numpy data attributed to experience column
loan["Experience"] = loan["Experience"].apply(
    lambda x: loan["Experience"].median() if x < 0 else x
)
In [ ]:
# checking the value counts to confirm amputation is successful
len(loan[loan["Experience"] < 0])
Out[ ]:
0

No more negative values in the variable Experience

2. Zip code conversion

Using the "uszipcode" library, we extract the "City" and "State" from the zipcode of every customer

In [ ]:
#import the searchengine from uszipcode library
import uszipcode
from uszipcode import SearchEngine

#create the search engine
search = SearchEngine()

#create a function to get the city from the zipcode
def get_city(x):
    return search.by_zipcode(x).city

#create a function to get the state from the zipcode
def get_state(x):
    return search.by_zipcode(x).state
Download /root/.uszipcode/simple_db.sqlite from https://github.com/MacHu-GWU/uszipcode-project/releases/download/1.0.1.db/simple_db.sqlite ...
  1.00 MB downloaded ...
  2.00 MB downloaded ...
  3.00 MB downloaded ...
  4.00 MB downloaded ...
  5.00 MB downloaded ...
  6.00 MB downloaded ...
  7.00 MB downloaded ...
  8.00 MB downloaded ...
  9.00 MB downloaded ...
  10.00 MB downloaded ...
  11.00 MB downloaded ...
  Complete!
In [ ]:
#create 2 empty lists to fill with city and state for each customer
the_city_ = []
the_state_=[]

#create a for loop to loop on the zipcodes, extract the city and state and fill the lists
for i in np.arange(0,5000):
    try:
        city = get_city(loan["ZIPCode"].iloc[i])
        the_city_.append(city)
        state = get_state(loan["ZIPCode"].iloc[i])
        the_state_.append(state)
    except:
        the_city_.append(np.nan)
        the_state_.append(np.nan)
    continue
In [ ]:
#Adding the city and state lists as new columns in the dataset
loan['City'] = the_city_
loan['State'] = the_state_
loan.head()
Out[ ]:
Age Experience Income ZIPCode Family CCAvg Education Mortgage Personal_Loan Securities_Account CD_Account Online CreditCard City State
0 25 1.0 49 91107 4 1.6 1 0 0 1 0 0 0 Pasadena CA
1 45 19.0 34 90089 3 1.5 1 0 0 1 0 0 0 Los Angeles CA
2 39 15.0 11 94720 1 1.0 1 0 0 0 0 0 0 Berkeley CA
3 35 9.0 100 94112 1 2.7 2 0 0 0 0 0 0 San Francisco CA
4 35 8.0 45 91330 4 1.0 2 0 0 0 0 0 1 Northridge CA
In [ ]:
#explorying the null values added in the dataframe from the city and state lists
loan['City'].isnull().value_counts()
Out[ ]:
count
City
False 4966
True 34

In [ ]:
loan['State'].isnull().value_counts()
Out[ ]:
count
State
False 4966
True 34

34 values are missing

In [ ]:
#extracting the zipcodes that are returning the nan values
zip_nan = loan[loan['City'].isnull()]
zip_nan['ZIPCode'].value_counts()
Out[ ]:
count
ZIPCode
92717 22
96651 6
92634 5
93077 1

There are 4 unique Zip codes reflecting as NAN in the city column.

In [ ]:
# Checking the same for the state column
zip_nan_state = loan[loan['State'].isnull()]
zip_nan_state['ZIPCode'].value_counts()
Out[ ]:
count
ZIPCode
92717 22
96651 6
92634 5
93077 1

The missing city and state values share the same ZIP codes. I will manually lookup for the 4 unique codes on google and then replace the NAN with the actual value

In [ ]:
#Create a dictionary with the Zip code googled City and State
zip_dict = {'92717':'Irvine, CA',
             '96651':'Rudno nad Hronom, BC',
             '92634':'Fullerton, CA',
             '93077':'Ventura, CA'
            }

#Create a function to fill the missing values
def fill_nan(data, indxs, value, column):
    for i in indxs:
        data[column].iloc[i]=value

#Create a for loop to fill in the missing city and state values
for i in zip_dict.keys():
    indxs = loan[loan['ZIPCode']==int(i)].index
    fill_nan(loan, indxs, zip_dict[str(i)].split(',')[0], 'City')
    fill_nan(loan, indxs, zip_dict[str(i)].split(',')[1], 'State')

#confirm null values are removed in City and State columns
loan.isnull().sum()
Out[ ]:
0
Age 0
Experience 0
Income 0
ZIPCode 0
Family 0
CCAvg 0
Education 0
Mortgage 0
Personal_Loan 0
Securities_Account 0
CD_Account 0
Online 0
CreditCard 0
City 0
State 0

In [ ]:
#Display value counts of states
loan['State'].value_counts()
Out[ ]:
count
State
CA 4966
CA 28
BC 6

There are two main states CA (majority) and BC, yet "CA" and " CA" can be merged and " BC" to "BC" for a better homoginity

In [ ]:
loan['State'].replace(' CA','CA',inplace=True)
loan['State'].replace(' BC','BC',inplace=True)
loan['State'].value_counts()
Out[ ]:
count
State
CA 4994
BC 6

In [ ]:
#Display value counts of Cities
loan['City'].value_counts()
Out[ ]:
count
City
Los Angeles 375
San Diego 269
San Francisco 257
Berkeley 241
Sacramento 148
... ...
Sierra Madre 1
Ladera Ranch 1
Sausalito 1
Tahoe City 1
Stinson Beach 1

245 rows × 1 columns


There are 245 different cities in the dataset, most customers are from Los Angeles, San Diego, San Francisco, Berkeley and Sacramento

Finally, we drop the ZIP Code column

In [ ]:
loan.drop('ZIPCode', axis =1, inplace=True)

Exploratory Data Analysis.¶

  • EDA is an important part of any project involving data.
  • It is important to investigate and understand the data better before building a model with it.
  • A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
  • A thorough analysis of the data, in addition to the questions mentioned below, should be done.

Questions:

  1. What is the distribution of mortgage attribute? Are there any noticeable patterns or outliers in the distribution?
  2. How many customers have credit cards?
  3. What are the attributes that have a strong correlation with the target attribute (personal loan)?
  4. How does a customer's interest in purchasing a loan vary with their age?
  5. How does a customer's interest in purchasing a loan vary with their education?

1. Univariate analysis

i. Visualizing the numerical data¶

From the 5 point summary, it is observed that Age, Experience, Income, CCAvg and Mortgage are numerical and continous in nature, hence we will define and apply a function to plot the histogram & boxplot for each variable

In [ ]:
#we will define a function to plot the boxplot and histogram for all numerical variables
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (12,7))
    kde: whether to the show density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a star will indicate the mean value of the column
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram

Age¶

In [ ]:
histogram_boxplot(loan,'Age')
No description has been provided for this image

This answer the question 4 in EDA section of the project¶

Observations:

  1. The min of age is 23, while the max is 67. Average age is 45 Years old

  2. The data looks slightly fit as a uniform distribution.

  3. The maximum number of clients is within the 58-60 years old range, there are also peak counts at 30-32, 38-40, 44-46 and 52-54 years old

  4. There are no outliers observed.

Experience¶

In [ ]:
histogram_boxplot(loan,'Experience')
No description has been provided for this image

Observation

  1. The min Experience is 0 year, while the max is 43 years and the mean is approximatly 20 years.

  2. The data is almost fitting as a uniform distribution with peaks at 12-14 years and 28-30 years

  3. There are no outliers observed

Income¶

In [ ]:
histogram_boxplot(loan,'Income')
No description has been provided for this image

Observation

  1. The min Income is USD46000, while the max income is USD224000 and average is USD64000

  2. Dataset is rightly skewed

  3. There is a notifiabe number of outliers, but it seem consistent with the provided data

  4. No action may be required for the outlier treatment

CCAvg - Credit Card Avg¶

In [ ]:
histogram_boxplot(loan,'CCAvg')
No description has been provided for this image

Observation

  1. The min CCAvg is USD 0

  2. This may reflect customers who do not to have credit cards)

  3. The max CCAvg is USD1900 and average is approx USD 1900

  4. Dataset is skewed to the right with a number of outliers that is homogenous with the provided data

  5. No action is recommended to the outliers

Mortgage¶

In [ ]:
histogram_boxplot(loan,'Mortgage')
No description has been provided for this image

Observation

  1. The min Mortgage is USD 0, while the max is USD635000 and average is USD 0

  2. Dataset heavily skewed to the right

  3. It would be better to separate the USD 0 mortgage from the >USD0 mortgage values in order to be abke to visualize it better and to plot the data again

In [ ]:
#extracting the customers with mortgage values > 0
mortgage = loan[loan['Mortgage']>0]
print(f'There are {len(mortgage)} customers under mortgage and forms {round((len(mortgage)/5000)*100)}% of the dataset')
There are 1538 customers under mortgage and forms 31% of the dataset
In [ ]:
#plotting mortgage of the customers again
histogram_boxplot(mortgage,'Mortgage')
No description has been provided for this image

This answer the question 1 in EDA section of the project¶

Observation

The mortgage distibution of the customers under mortgage is skewed to the right with a minimum value of approx 99000 and to max value of 635000 and mean value between USD 180000 to 200000. Better than the previous visualization.

ii. Visualizing the categorical data¶

From the 5 point summary, it is observed that Zip codes, Family size, Education, Personal_Loan, CD_Account, Online and CreditCard are categorical in nature, hence we will define and apply a function to plot a labelled barplot for each variable

In [ ]:
# function to create labeled barplots


def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 2, 6))
    else:
        plt.figure(figsize=(n + 2, 6))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot

Cities Zip Codes¶

In [ ]:
plt.figure(figsize=(15,45))
sns.countplot(data=loan, y='City', order=loan['City'].value_counts().index)
Out[ ]:
<Axes: xlabel='count', ylabel='City'>
No description has been provided for this image

Observation

  1. Approx. 25.8% of the customers in the dataset are residing in Los Angeles, San Diego, San Fransisco, Berkeley and Sacramento (these are the top 5 cities in the dataset)

  2. The top city is Los Angeles where 7.5% of the customers resides

  3. Overview of the city

City |No. of customers|

  1. Los Angeles | 375|
  2. San Diego | 269|
  3. San Francisco | 257|
  4. Berkeley | 241|
  5. Sacramento | 148|

Family size¶

In [ ]:
labeled_barplot(loan,'Family',perc=True)
No description has been provided for this image

Observation

  1. The majority of Family sizes (29.4%) is size 1

  2. Followed by (25.9%) of size 2

  3. The 3rd family size is Size 4 which is (24.4%)

  4. Finally, family size 3 have (20.2%) in the dataset

Education¶

In [ ]:
labeled_barplot(loan,'Education',perc=True)
No description has been provided for this image

This answer the question 5 in EDA section of the project¶

Observation

  1. 41.9% of customers are 1: Undergrad

  2. 28.1% of customers are 2: Graduate

  3. 30.0% of customers are 3: Advanced/Professional

Personal Loan¶

In [ ]:
labeled_barplot(loan,'Personal_Loan',perc=True)
No description has been provided for this image

This answer the question 3 in EDA section of the project¶

Observation

  1. 90.4% of customers Did not accept a loan

  2. 9.6% of customers accepted a loan

  3. This need more work to find out the difference of the 0.6% mismatch

CD_Account¶

In [ ]:
labeled_barplot(loan,'CD_Account',perc=True)
No description has been provided for this image

Observation

  1. 94% of customers Do Not have a CD_Account

  2. 6% only have a CD_Account

Online¶

In [ ]:
labeled_barplot(loan,'Online',perc=True)
No description has been provided for this image

Observation

  1. 59.7% of customers use the online banking services

  2. 40.3% of customers do not use the online banking services

CreditCard¶

In [ ]:
labeled_barplot(loan,'CreditCard',perc=True)
No description has been provided for this image

This answer the question 2 in EDA section of the project¶

Observation

  1. 70.6% of customers do not use a credit card issued by a different bank

  2. 29.4% of customers use a credit card issued by a different bank

2. Bivariate analysis

I will like to start with a simple pair plot in order to view if any correlation between the data set variables will occur

In [ ]:
plt.figure(figsize=(15,7))
# sns.pairplot(loan, diag_kind='kde')
sns.pairplot(loan, hue="Personal_Loan")
plt.show()
<Figure size 1500x700 with 0 Axes>
No description has been provided for this image

Observations:

The orange spots in the plot represent the customers who accepted a personal loan and the blue spots show the ones who did not.

a. From the univariate analysis of the personal loan dataset done above, it was observed that only 9.6% of the customers accepted the personal loan, hence this observation is consistent with the pair plot where the majority of the spots are blue

b. It is observed that there is a very strong linear corelation between Age and Experience in the dataset

c. It is also observed that there is slight correlation between Income and CCAvg as shown in the pair plot above

d. High concentraion of customers who accepted a personal loan are observed at the following points in the dataset:

  1. Higher Income level (starting at approx USD100000 and above)
  2. Higher CCAvg (starting at approx USD3000 and above)
  3. Higher Mortgage value (starting at approx USD300000 and above)
  4. Customers with CD_Accounts in the dataset
  5. Customers who use credict cards issued from other banks other than the bank under review
  6. Famillies with sizes 3 and 4
  7. Customers with Eductaion 2-graduate and 3-advanced/professional

I think an idea of what to expect in the above variables which have from medium to high prediction power on the classification models is very essential.

In [ ]:
plt.figure(figsize=(15,10))
sns.heatmap(loan.select_dtypes(include=np.number).corr(),annot=True,cmap='YlGnBu')
Out[ ]:
<Axes: >
No description has been provided for this image

Observation

The following heatmap observation above is not very different from what has been seen in the pairplot above, which are;

  1. The correlation between Age and Experience is very high (0.98)

  2. The correlation between Income and CCAvg is lower (0.65)

  3. All other correlation values are quite small for our further consideration

  4. Income and mortgage is also positively correlated but not very high, its below 25% of correllation.

In [ ]:
# Write the code here
plt.figure(figsize=(10,5))
plt.title('CountPlot:Education Category who have taken Personal Loans')
sns.countplot(loan, x="Personal_Loan", hue='Education')
plt.show()
No description has been provided for this image

Observation

The following heatmap observation above is not very different from what has been seen in the pairplot above, which are;

  1. The correlation between Personal loan and Education is inverse, the higher the education of customers the lower the willingness to take personal loan

  2. People with higher education tend not to take personal loans

In [ ]:
import plotly.express as px
Figure_3D=px.scatter_3d(loan,x='Personal_Loan', y='Age',z='Income',color='Age');
Figure_3D.show()
In [ ]:
Figure_3D.write_html('/content/drive/My Drive/Addedum image to project.html')

Observation

The above plotly image is a 3-D immage which tend to explain the correlation between age, income and the ability or willingness of custimers to accept personal loan

NB: To see this 3-D image, please see the Addendum 1 uploaded as a separate html file.

Observing effect of the variables on our target variable (Personal loan)¶

a. Plotting the target Vs distribution of numerical variables¶

Let us plot the target Vs distribution of the variables to understand further our data set prior building the model

In [ ]:
### function to plot distributions wrt our target variables


def distribution_plot_wrt_target(data, predictor, target):

    fig, axs = plt.subplots(2, 2, figsize=(10, 7))

    target_uniq = data[target].unique()

    axs[0, 0].set_title("Distribution of Target for target=" + str(target_uniq[0]))
    sns.histplot(
        data=data[data[target] == target_uniq[0]],
        x=predictor,
        kde=True,
        ax=axs[0, 0],
        color="teal",
        stat="density",
    )

    axs[0, 1].set_title("Distribution of Target for target=" + str(target_uniq[1]))
    sns.histplot(
        data=data[data[target] == target_uniq[1]],
        x=predictor,
        kde=True,
        ax=axs[0, 1],
        color="orange",
        stat="density",
    )

    axs[1, 0].set_title("Boxplot w.r.t Target")
    sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")

    axs[1, 1].set_title("Boxplot (without outliers) w.r.t Target")
    sns.boxplot(
        data=data,
        x=target,
        y=predictor,
        ax=axs[1, 1],
        showfliers=False,
        palette="gist_rainbow",
    )

    plt.tight_layout()
    plt.show()
In [ ]:
#checking the columns heads in the dataframe
loan.columns
Out[ ]:
Index(['Age', 'Experience', 'Income', 'Family', 'CCAvg', 'Education',
       'Mortgage', 'Personal_Loan', 'Securities_Account', 'CD_Account',
       'Online', 'CreditCard', 'City', 'State'],
      dtype='object')

Age Vs Personal Loan¶

In [ ]:
distribution_plot_wrt_target(loan,'Age','Personal_Loan')
No description has been provided for this image

Observation

  1. The mean age for customers accepting or not accepting personal loans is very close to the value 45 years of age.

  2. As seen from the pair plot Age, it does not have relation with our target variable, so it has a small prediction power

Experience Vs Personal Loan¶

In [ ]:
distribution_plot_wrt_target(loan,'Experience','Personal_Loan')
No description has been provided for this image

Observation

  1. The mean Experience for customers accepting or not accepting personal loans is very close to the value 20 years of experience

  2. As observed in the pair plot, Experience does not have relation with our target variable, so it has a small prediction power

Income Vs Personal Loan¶

In [ ]:
distribution_plot_wrt_target(loan,'Income','Personal_Loan')
No description has been provided for this image

Observations:

  1. As I observed before, the Income variable varies greatly between customers who accepted and who does not accept personal loans.

  2. The mean income for the customers who did not accept the loan is approx USD65000

  3. While the mean income for the customers who accepted the loan is approx USD145000

  4. Therefore, the income level of the customers has high impact on the customer's decision to take personal loan

  5. The higher the income, the more chances the customer will accept a personal loan

Mortgage Vs Personal Loan¶

In [ ]:
distribution_plot_wrt_target(loan,'Mortgage','Personal_Loan')
No description has been provided for this image

Observation

  1. Customers paying mortgage are more likely to accept personal loan

  2. Variable distribution is heavily right skewed due to the high number of customers not paying mortgage

  3. In view of the above point (2), its inevitable to plot the distribution of customers paying mortgage only which are shown in the dataframe

  4. This should provide clearer and more in-depth insight into the mortgage mean value for customers accepting and those custmers who does not accept a personal loan.

In [ ]:
#distribution of customers paying mortgage only
distribution_plot_wrt_target(mortgage,'Mortgage','Personal_Loan')
No description has been provided for this image

Observation

  1. The mean value of mortgage paid by customers not accepting a personal loan is approx USD150000 USD which is much less than the mean value of customers accepting personal loan of approx USD290000

  2. As its shown now, the higher the mortgage value increase, the more likely it is for a customer to accept a personal loan.

CCAvg Vs Personal Loan¶

In [ ]:
distribution_plot_wrt_target(loan,'CCAvg','Personal_Loan')
No description has been provided for this image

***Observation**

  1. The mean value of CCAvg of customers not accepting a personal loan is approx USD 1600 which is much less than the mean value of customers accepting personal loan of approx USD4000.

  2. Customers who accept personal loan spends more money on a monthly basis

b. Plotting distribution of categorical variables Vs Target variable (Personal Loan)¶

In [ ]:
# function to plot stacked bar chart


def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="bar", stacked=True, figsize=(count + 5, 6))
    plt.legend(
        loc="lower left", frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()

Family Vs Personal Loan¶

In [ ]:
stacked_barplot(loan,'Family','Personal_Loan')
Personal_Loan     0    1   All
Family                        
All            4520  480  5000
4              1088  134  1222
3               877  133  1010
1              1365  107  1472
2              1190  106  1296
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observation

  1. Famillies of size 3 or 4 have more propensity to accept a personal loan

  2. Hence, we can conclude that as the familly size grows, customers would be more willing to accept personal loans

Education Vs Personal Loan¶

In [ ]:
stacked_barplot(loan,'Education','Personal_Loan')
Personal_Loan     0    1   All
Education                     
All            4520  480  5000
3              1296  205  1501
2              1221  182  1403
1              2003   93  2096
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observation

  1. Customers with education levels 2 and 3 are more willing to accept a personal loan than customers with education level 1

  2. The highest being customers with education level 3

Securities Account Vs Personal Loan¶

In [ ]:
stacked_barplot(loan,'Securities_Account','Personal_Loan')
Personal_Loan          0    1   All
Securities_Account                 
All                 4520  480  5000
0                   4058  420  4478
1                    462   60   522
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observation

  1. Customers with Security account are more willing to accept a personal loan

CD_Account Vs Personal Loan¶

In [ ]:
stacked_barplot(loan,'CD_Account','Personal_Loan')
Personal_Loan     0    1   All
CD_Account                    
All            4520  480  5000
0              4358  340  4698
1               162  140   302
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observation

  1. Customers with CD_account are more willing to accept a personal loan

Online Vs Personal Loan¶

In [ ]:
stacked_barplot(loan,'Online','Personal_Loan')
Personal_Loan     0    1   All
Online                        
All            4520  480  5000
1              2693  291  2984
0              1827  189  2016
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observation

  1. Customers who use iternet banking services and who doesn't are are not different from each other.

  2. The possibility of accepting personal loans or not accepting it is almost likely the same as its shown in the plot above.

CreditCard Vs Personal Loan¶

In [ ]:
stacked_barplot(loan,'CreditCard','Personal_Loan')
Personal_Loan     0    1   All
CreditCard                    
All            4520  480  5000
0              3193  337  3530
1              1327  143  1470
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observation

  1. Customers who use credit cards issued by other banks and those who does not are almost equally likely to accept personal loans

City Vs Personal Loan¶

For a better visualization, I will assign a threshold in order to be able to pick the most common cities between the customers and and to assign the remaining to others.

The resultant dataframe created will be used to visualize how most common cities would vary with the target - personal loan.

In [ ]:
#assigning the threshold of 50
cities = loan['City'].value_counts()
threshold = 50
cities[cities.values >= threshold]
Out[ ]:
count
City
Los Angeles 375
San Diego 269
San Francisco 257
Berkeley 241
Sacramento 148
Palo Alto 130
Stanford 127
Davis 121
La Jolla 112
Santa Barbara 103
San Jose 96
Irvine 80
Santa Clara 77
Monterey 72
Pasadena 71
Oakland 55
Newbury Park 53
Claremont 52
Menlo Park 52
Santa Cruz 51
El Segundo 50

In [ ]:
#the threshhold of 50 looks appropriate, so lets extract the names of the cities
cities_list = cities[cities.values >= threshold].index.tolist()
print("Cities names taken into consideration:\n", len(cities_list))
print(cities_list)
Cities names taken into consideration:
 21
['Los Angeles', 'San Diego', 'San Francisco', 'Berkeley', 'Sacramento', 'Palo Alto', 'Stanford', 'Davis', 'La Jolla', 'Santa Barbara', 'San Jose', 'Irvine', 'Santa Clara', 'Monterey', 'Pasadena', 'Oakland', 'Newbury Park', 'Claremont', 'Menlo Park', 'Santa Cruz', 'El Segundo']
In [ ]:
#we create a copy of the created data frame as we are doing this transformation only for better visualization
loan_t = loan.copy()
In [ ]:
loan_t['City'] = loan_t['City'].apply(lambda x:x if x in cities_list else 'others')
In [ ]:
#function to plot horizontal stacked bar chart
def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="barh", stacked=True, figsize=(count + 5, 9))
    plt.legend(
        loc="lower left", frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()
In [ ]:
#plotting a horizonal stacked bar plot of the transformed dataset
plt.figure(figsize=(10,40))
stacked_barplot(loan_t,'City','Personal_Loan')
Personal_Loan     0    1   All
City                          
All            4520  480  5000
others         2181  227  2408
Los Angeles     337   38   375
Berkeley        214   27   241
San Diego       248   21   269
San Francisco   238   19   257
Palo Alto       114   16   130
La Jolla         97   15   112
Stanford        114   13   127
Sacramento      135   13   148
Santa Clara      65   12    77
San Jose         85   11    96
Irvine           69   11    80
Pasadena         61   10    71
Santa Barbara    95    8   103
Santa Cruz       44    7    51
Monterey         66    6    72
Davis           115    6   121
El Segundo       45    5    50
Oakland          50    5    55
Claremont        48    4    52
Menlo Park       48    4    52
Newbury Park     51    2    53
------------------------------------------------------------------------------------------------------------------------
<Figure size 1000x4000 with 0 Axes>
No description has been provided for this image

Observation

  1. Santa Clara shows the greatest ratio of customers willing to accept personal loans

  2. Pasadena is the second city that show greater ration of customers who are willing to accept personal loans

  3. Irvine is the third city that shows great ratio of customers willing to accept personal loans

State Vs Personal_Loan¶

In [ ]:
stacked_barplot(loan,'State','Personal_Loan')
Personal_Loan     0    1   All
State                         
CA             4514  480  4994
All            4520  480  5000
BC                6    0     6
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observations

  1. Customers living in the State of California USA are more willing to accept personal loans than the customers in British Columbia Canada

Key Insights based on EDA¶

The Five point summary for all the variables including the city and state added columns:

In [ ]:
loan.describe(include='all').T
Out[ ]:
count unique top freq mean std min 25% 50% 75% max
Age 5000.0 NaN NaN NaN 45.3384 11.463166 23.0 35.0 45.0 55.0 67.0
Experience 5000.0 NaN NaN NaN 20.3276 11.253035 0.0 11.0 20.0 30.0 43.0
Income 5000.0 NaN NaN NaN 73.7742 46.033729 8.0 39.0 64.0 98.0 224.0
Family 5000.0 NaN NaN NaN 2.3964 1.147663 1.0 1.0 2.0 3.0 4.0
CCAvg 5000.0 NaN NaN NaN 1.937938 1.747659 0.0 0.7 1.5 2.5 10.0
Education 5000.0 NaN NaN NaN 1.881 0.839869 1.0 1.0 2.0 3.0 3.0
Mortgage 5000.0 NaN NaN NaN 56.4988 101.713802 0.0 0.0 0.0 101.0 635.0
Personal_Loan 5000.0 NaN NaN NaN 0.096 0.294621 0.0 0.0 0.0 0.0 1.0
Securities_Account 5000.0 NaN NaN NaN 0.1044 0.305809 0.0 0.0 0.0 0.0 1.0
CD_Account 5000.0 NaN NaN NaN 0.0604 0.23825 0.0 0.0 0.0 0.0 1.0
Online 5000.0 NaN NaN NaN 0.5968 0.490589 0.0 0.0 1.0 1.0 1.0
CreditCard 5000.0 NaN NaN NaN 0.294 0.455637 0.0 0.0 0.0 1.0 1.0
City 5000 245 Los Angeles 375 NaN NaN NaN NaN NaN NaN NaN
State 5000 2 CA 4994 NaN NaN NaN NaN NaN NaN NaN

EDA Final Observations;

Key observations on the univariate analysis:

  1. The min Age is 23 ,max is 67, Average is 45 Years old

  2. Data seems is slightly fitting a uniform distribution.

  3. The min Experience is 0 years, the max is 43 years and the mean is approx. 20 years.

  4. The min CCAvg is USD0 (reflecting customers who do not own credit cards), max is USD1900 and average is USD10000

  5. 69% of the customers are not paying mortage and the remainig 31% paying min value of approx USD99000 to max value of USD635000 and mean value between USD180000-200000 for mortgage

  6. The top 5 cities are Los Angeles (1st),San Diego (2nd), San Fransisco (3rd), Berkeley (4th) and Sacramento (5th)

  7. The top State is CA

  8. The majority of Family sizes (29.4%) is size 1, followed by (25.9%) of family size 2 then (24.4%) of size family size 4 and finally (20.2%) of family size 3.

  9. Customer Education is distributed as follow: 41.9% of customers are 1: Undergrad 28.1% of customers are 2: Graduate 30.0% of customers are 3: Advanced/Professional

  10. 94% of customers Do Not have a CD_Account and 6% only have a CD_Account

  11. 40.3% of customers do not use the online banking services and 59.7% have Online banking services

  12. 70.6% of customers do not hold a credit card and 29.4% of customers hold credit cards

  13. The target variable Personal_Loan shows 90.4% of customers Did not accept a loan and 9.6% of customers accepted a loan

Key Insights and Observations on the multivariate analysis:

Correlation between variables

  1. The correlation between Age and Experience is very high (value = 0.98)

  2. The correlation between Income and CCAvg is lower (value = 0.65)

  3. All other correlation values are quiet small for consideration

The effect of variables on the target variable

(Variable) (Effect on Target Variable - Personal Loans)

Income The higher the income, the more chances the customer will accept a personal loan

CCAvg The higher the monthly spending of customers increase, the more likely they are willing to accept personal loan

Education The higher the Education level of the customers, the more willing they are to accept a personal loan

Mortgage The higher the value of mortgage, the higher the customer is more likely to accept a personal loan

City Santa Clara shows the greatest ratio of customers willing to accept personal loans, then Pasadena and lastly, Irviner.

State The customers residing in CA are more willing to accept personal loans

Securities_Acc Customers with a Security account are more willing to accept a personal loan

CD_Account Customers with CD_account are more willing to accept a personal loan

Online No effect on the target variable is being observed on the target variable (Personal loan)

Age No effect on the target variable is being observed on the target variable (Personal loan)

Experience No effect on the target variable is being observed on the target variable (Personal loan)

CreditCard No effect on the target variable is being observed on the target variable (Personal loan)

Expected target dependencies

From Higher to high importance is being expected for these variables: Income, Familly, CCAvg, Education, Mortgage, Security Account and CD_Account

Low importance is expected for these variables: Age, Experience, Credit card and Online

Model Building¶

Data Preparation for modelling¶

Previously applied Data Pre-proccesing actions:

  1. ID column: dropped

  2. Experience column: Negative values replaced by Median

  3. ZIP Code Column: dropped but splitted into City and state

1. Creating the dummy variables¶

In [ ]:
#checking columns in the data
loan.columns
Out[ ]:
Index(['Age', 'Experience', 'Income', 'Family', 'CCAvg', 'Education',
       'Mortgage', 'Personal_Loan', 'Securities_Account', 'CD_Account',
       'Online', 'CreditCard', 'City', 'State'],
      dtype='object')
In [ ]:
#Creating a dummy dataframe for the model
df_model = pd.get_dummies(
    loan,
    columns=[
        "Education",
        "City",
        "State",
    ],
    drop_first=True,
)
df_model.head()
Out[ ]:
Age Experience Income Family CCAvg Mortgage Personal_Loan Securities_Account CD_Account Online CreditCard Education_2 Education_3 City_Alameda City_Alamo City_Albany City_Alhambra City_Anaheim City_Antioch City_Aptos City_Arcadia City_Arcata City_Bakersfield City_Baldwin Park City_Banning City_Bella Vista City_Belmont City_Belvedere Tiburon City_Ben Lomond City_Berkeley City_Beverly Hills City_Bodega Bay City_Bonita City_Boulder Creek City_Brea City_Brisbane City_Burlingame City_Calabasas City_Camarillo City_Campbell City_Canoga Park City_Capistrano Beach City_Capitola City_Cardiff By The Sea City_Carlsbad City_Carpinteria City_Carson City_Castro Valley City_Ceres City_Chatsworth City_Chico City_Chino City_Chino Hills City_Chula Vista City_Citrus Heights City_Claremont City_Clearlake City_Clovis City_Concord City_Costa Mesa City_Crestline City_Culver City City_Cupertino City_Cypress City_Daly City City_Danville City_Davis City_Diamond Bar City_Edwards City_El Dorado Hills City_El Segundo City_El Sobrante City_Elk Grove City_Emeryville City_Encinitas City_Escondido City_Eureka City_Fairfield City_Fallbrook City_Fawnskin City_Folsom City_Fremont City_Fresno City_Fullerton City_Garden Grove City_Gilroy City_Glendale City_Glendora City_Goleta City_Greenbrae City_Hacienda Heights City_Half Moon Bay City_Hawthorne City_Hayward City_Hermosa Beach City_Highland City_Hollister City_Hopland City_Huntington Beach City_Imperial City_Inglewood City_Irvine City_La Jolla City_La Mesa City_La Mirada City_La Palma City_Ladera Ranch City_Laguna Hills City_Laguna Niguel City_Lake Forest City_Larkspur City_Livermore City_Loma Linda City_Lomita City_Lompoc City_Long Beach City_Los Alamitos City_Los Altos City_Los Angeles City_Los Gatos City_Manhattan Beach City_March Air Reserve Base City_Marina City_Martinez City_Menlo Park City_Merced City_Milpitas City_Mission Hills City_Mission Viejo City_Modesto City_Monrovia City_Montague City_Montclair City_Montebello City_Monterey City_Monterey Park City_Moraga City_Morgan Hill City_Moss Landing City_Mountain View City_Napa City_National City City_Newbury Park City_Newport Beach City_North Hills City_North Hollywood City_Northridge City_Norwalk City_Novato City_Oak View City_Oakland City_Oceanside City_Ojai City_Orange City_Oxnard City_Pacific Grove City_Pacific Palisades City_Palo Alto City_Palos Verdes Peninsula City_Pasadena City_Placentia City_Pleasant Hill City_Pleasanton City_Pomona City_Porter Ranch City_Portola Valley City_Poway City_Rancho Cordova City_Rancho Cucamonga City_Rancho Palos Verdes City_Redding City_Redlands City_Redondo Beach City_Redwood City City_Reseda City_Richmond City_Ridgecrest City_Rio Vista City_Riverside City_Rohnert Park City_Rosemead City_Roseville City_Rudno nad Hronom City_Sacramento City_Salinas City_San Anselmo City_San Bernardino City_San Bruno City_San Clemente City_San Diego City_San Dimas City_San Francisco City_San Gabriel City_San Jose City_San Juan Bautista City_San Juan Capistrano City_San Leandro City_San Luis Obispo City_San Luis Rey City_San Marcos City_San Mateo City_San Pablo City_San Rafael City_San Ramon City_San Ysidro City_Sanger City_Santa Ana City_Santa Barbara City_Santa Clara City_Santa Clarita City_Santa Cruz City_Santa Monica City_Santa Rosa City_Santa Ynez City_Saratoga City_Sausalito City_Seal Beach City_Seaside City_Sherman Oaks City_Sierra Madre City_Signal Hill City_Simi Valley City_Sonora City_South Gate City_South Lake Tahoe City_South Pasadena City_South San Francisco City_Stanford City_Stinson Beach City_Stockton City_Studio City City_Sunland City_Sunnyvale City_Sylmar City_Tahoe City City_Tehachapi City_Thousand Oaks City_Torrance City_Trinity Center City_Tustin City_Ukiah City_Upland City_Valencia City_Vallejo City_Van Nuys City_Venice City_Ventura City_Vista City_Walnut Creek City_Weed City_West Covina City_West Sacramento City_Westlake Village City_Whittier City_Woodland Hills City_Yorba Linda City_Yucaipa State_CA
0 25 1.0 49 4 1.6 0 0 1 0 0 0 False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True
1 45 19.0 34 3 1.5 0 0 1 0 0 0 False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True
2 39 15.0 11 1 1.0 0 0 0 0 0 0 False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True
3 35 9.0 100 1 2.7 0 0 0 0 0 0 True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True
4 35 8.0 45 4 1.0 0 0 0 0 0 1 True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True
In [ ]:
#checking the shape of the dataset
df_model.shape
Out[ ]:
(5000, 258)
In [ ]:
#checking the data types
df_model.dtypes.value_counts()
Out[ ]:
count
bool 247
int64 9
float64 2

Observation:

No boolean or object data types, dataset is ready to go

2. Splitting the data for better suitability¶

In [ ]:
#creating train and test dataset
X = df_model.drop('Personal_Loan',axis=1)
y = df_model['Personal_Loan']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
In [ ]:
#Checking the train and test dataset sizes for confirmation
print(f'X_train shape:{X_train.shape}')
print(f'X_test shape:{X_test.shape}')
X_train shape:(4000, 257)
X_test shape:(1000, 257)
In [ ]:
#checking the percentage of target variable classes in the train and test sets
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set:
Personal_Loan
0    0.905
1    0.095
Name: proportion, dtype: float64
Percentage of classes in test set:
Personal_Loan
0    0.9
1    0.1
Name: proportion, dtype: float64

Observation:

This has been observed in the univariate analysis above, the class 1 proportion is 9.5% of the data

Model Evaluation Criterion¶

Model can make wrong predictions as:

  1. Predicting a customer will accept a loan but in reality the customer would not accept a loan. Impact - Loss of resources

  2. Predicting a customer will not accept a loan but in reality the customer would have accepted a loan. Impact - Loss of opportunity

Which case is more important?

If we predict a customer who was going to accept a loan as a customer and at the end will not accept a loan - loss of opportunity (FN)

How to reduce this loss (False Negatives)?

recall should be maximized, the greater the recall, the higher the chances of minimizing the false negatives.

Finally, we will start by creating functions to calculate different metrics and confusion matrix for the models

In [ ]:
##  Function to calculate recall score
def get_recall_score(model, predictors, target):
    """
    model: classifier
    predictors: independent variables
    target: dependent variable

    """
    prediction = model.predict(predictors)
    return recall_score(target, prediction)
In [ ]:
##  Function to calculate confusion matrix
def confusion_matrix_sklearn(model, predictors, target):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    """
    y_pred = model.predict(predictors)
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")

Model Building¶

1. Logistic Regression¶

Define the functions that supports the performance evaluation of the Logistic Regression Model. These are;

a. confusion_matrix_sklearn_with_threshold (Builds the confusion matrix of the classification model)

b. model_performance_classification_sklearn_with_threshold (Computes the different metrics)

c. plot_prec_recall_vs_tresh (plots precission and recall VS Threshold)

In [ ]:
# defining a function to plot the confusion_matrix of a classification model built using sklearn
def confusion_matrix_sklearn_with_threshold(model, predictors, target, threshold=0.5):
    """
    To plot the confusion_matrix, based on the threshold specified, with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """
    pred_prob = model.predict_proba(predictors)[:, 1]
    pred_thres = pred_prob > threshold
    y_pred = np.round(pred_thres)

    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
In [ ]:
# defining a function to compute different metrics to check performance of a classification model built using sklearn

def model_performance_classification_sklearn_with_threshold(model, predictors, target, threshold=0.5):
    """
    Function to compute different metrics, based on the threshold specified, to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # predicting using the independent variables
    pred_prob = model.predict_proba(predictors)[:, 1]
    pred_thres = pred_prob > threshold
    pred = np.round(pred_thres)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {
            "Accuracy": acc,
            "Recall": recall,
            "Precision": precision,
            "F1": f1,
        },
        index=[0],
    )

    return df_perf
In [ ]:
# defining a function to compute thresholds
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="precision")
    plt.plot(thresholds, recalls[:-1], "g--", label="recall")
    plt.xlabel("Threshold")
    plt.legend(loc="upper left")
    plt.ylim([0, 1])
In [ ]:
# There are different solvers available in Sklearn logistic regression
# The newton-cg solver is applied as it is faster for high-dimensional data

lg = LogisticRegression(solver="newton-cg", random_state=1)
model = lg.fit(X_train, y_train)

Locating the coefficients¶

In [ ]:
# checking the coefficients and intercept of the model
coef_df = pd.DataFrame(np.append(lg.coef_, lg.intercept_),
    index=X_train.columns.tolist() + ["Intercept"],
    columns=["Coefficients"],
)
coef_df
Out[ ]:
Coefficients
Age 0.073051
Experience -0.068624
Income 0.058481
Family 0.638125
CCAvg 0.161678
... ...
City_Woodland Hills 0.755017
City_Yorba Linda -0.036893
City_Yucaipa -0.006452
State_CA -0.006848
Intercept -14.691445

258 rows × 1 columns

Coefficient interpretations¶

In [ ]:
#sorting by descending order to recognize the more powerful variables
coef_df.sort_values(by='Coefficients',ascending=False)
Out[ ]:
Coefficients
Education_3 3.710503
Education_2 3.472068
CD_Account 3.139787
City_Los Gatos 1.214717
City_Whittier 0.915968
... ...
CreditCard -0.831228
City_Carlsbad -0.868982
City_Tustin -0.870770
City_Livermore -0.885393
Intercept -14.691445

258 rows × 1 columns

In [ ]:
#The positive coefficients
coef_df[coef_df['Coefficients']>0].sort_values(by='Coefficients',ascending=False)
Out[ ]:
Coefficients
Education_3 3.710503
Education_2 3.472068
CD_Account 3.139787
City_Los Gatos 1.214717
City_Whittier 0.915968
City_Martinez 0.894860
City_Oak View 0.888115
City_Greenbrae 0.858732
City_Cardiff By The Sea 0.819786
City_Irvine 0.790847
City_West Sacramento 0.785846
City_Woodland Hills 0.755017
City_Richmond 0.743801
City_Banning 0.733911
City_Campbell 0.730927
City_San Ysidro 0.720148
City_Sunnyvale 0.710214
City_Venice 0.709283
City_Novato 0.695282
City_Bakersfield 0.680077
City_Torrance 0.672859
City_Glendale 0.663426
City_Moss Landing 0.646149
Family 0.638125
City_Seaside 0.556037
City_San Juan Capistrano 0.522589
City_Los Altos 0.489764
City_El Sobrante 0.489080
City_Reseda 0.489041
City_San Jose 0.482935
City_Fairfield 0.473225
City_Thousand Oaks 0.461609
City_Valencia 0.460971
City_Calabasas 0.458693
City_Ridgecrest 0.458088
City_Fawnskin 0.426964
City_Rohnert Park 0.426482
City_Beverly Hills 0.418402
City_Lomita 0.405845
City_Vallejo 0.405464
City_Riverside 0.394553
City_Santa Clarita 0.366339
City_Cypress 0.348640
City_Montebello 0.340062
City_San Clemente 0.337455
City_Placentia 0.316495
City_Chula Vista 0.302450
City_Roseville 0.291509
City_Laguna Niguel 0.283122
City_Santa Barbara 0.264081
City_Santa Cruz 0.242712
City_Hayward 0.233567
City_Pasadena 0.216142
City_Berkeley 0.211861
City_San Diego 0.205123
City_Stanford 0.202574
City_Fullerton 0.173889
CCAvg 0.161678
City_Fremont 0.160599
City_Sherman Oaks 0.144131
City_Huntington Beach 0.137023
City_Elk Grove 0.123813
City_Capitola 0.105287
City_San Luis Rey 0.091535
City_Monrovia 0.091074
City_Sacramento 0.089503
City_Oceanside 0.076763
City_San Francisco 0.074668
Age 0.073051
City_Walnut Creek 0.063006
City_San Bernardino 0.062996
City_Ventura 0.062261
Income 0.058481
City_Eureka 0.053431
City_La Jolla 0.049407
City_Costa Mesa 0.046454
City_Los Angeles 0.046066
City_Carpinteria 0.037365
City_Camarillo 0.033949
City_Norwalk 0.024411
City_Bella Vista 0.010442
City_Palo Alto 0.002611
Mortgage 0.000971
City_Pleasanton 0.000951

Observation Positive coefficients:

The Above table shows all variables with Positive cefficients, meaning variables with value increase the propability of the customer to accept a personal loan increases.

The top 5 variables affecting the target variable are:

  1. Education_3
  2. Education_2
  3. CD_Account
  4. City_Los Gatos
  5. City_Whittier
In [ ]:
#The Negative coefficients
coef_df[coef_df['Coefficients']<0].sort_values(by='Coefficients')
Out[ ]:
Coefficients
Intercept -14.691445
City_Livermore -0.885393
City_Tustin -0.870770
City_Carlsbad -0.868982
CreditCard -0.831228
City_Manhattan Beach -0.812330
City_Alhambra -0.740918
Securities_Account -0.704059
City_Davis -0.673413
City_Milpitas -0.648076
City_Loma Linda -0.635195
City_Menlo Park -0.601949
City_Redwood City -0.574392
City_Monterey -0.571322
Online -0.559086
City_Oakland -0.555836
City_Brisbane -0.542601
City_Fallbrook -0.529842
City_Redondo Beach -0.521048
City_Emeryville -0.495187
City_Northridge -0.478665
City_Arcata -0.469662
City_Santa Clara -0.455029
City_North Hollywood -0.440298
City_Studio City -0.424727
City_Diamond Bar -0.421248
City_Rancho Cordova -0.411055
City_Burlingame -0.376264
City_South San Francisco -0.359612
City_Mountain View -0.353475
City_Newbury Park -0.348923
City_Boulder Creek -0.345930
City_Santa Ana -0.335801
City_Fresno -0.330912
City_San Anselmo -0.324996
City_San Luis Obispo -0.322205
City_Mission Viejo -0.321912
City_Bonita -0.321378
City_Chatsworth -0.308760
City_Culver City -0.299640
City_San Marcos -0.296642
City_Laguna Hills -0.279902
City_Anaheim -0.277668
City_Redlands -0.275373
City_Goleta -0.275021
City_Merced -0.273667
City_Redding -0.263811
City_Orange -0.247645
City_Pomona -0.236133
City_La Mesa -0.233303
City_Santa Ynez -0.230488
City_Sanger -0.227368
City_Arcadia -0.226941
City_Canoga Park -0.225762
City_Hermosa Beach -0.225225
City_Marina -0.224952
City_Modesto -0.221858
City_La Palma -0.219094
City_Palos Verdes Peninsula -0.203787
City_Salinas -0.203025
City_Antioch -0.189764
City_National City -0.186494
City_San Leandro -0.178120
City_Capistrano Beach -0.175551
City_Alameda -0.174646
City_Sunland -0.170529
City_Chino Hills -0.166438
City_West Covina -0.164322
City_San Ramon -0.152015
City_Montclair -0.149160
City_San Juan Bautista -0.147283
City_San Mateo -0.146367
City_Ojai -0.144069
City_Rosemead -0.131931
City_Simi Valley -0.130683
City_Tehachapi -0.129535
City_Sonora -0.123752
City_Highland -0.121139
City_Seal Beach -0.120118
City_Lake Forest -0.117873
City_Monterey Park -0.114071
City_Van Nuys -0.106638
City_Sierra Madre -0.106416
City_Ukiah -0.106321
City_Poway -0.103103
City_Citrus Heights -0.102328
City_Rancho Cucamonga -0.098986
City_Hopland -0.093481
City_Newport Beach -0.089097
City_South Lake Tahoe -0.087573
City_Chico -0.086325
City_Los Alamitos -0.084993
City_Portola Valley -0.083165
City_Porter Ranch -0.082931
City_Santa Monica -0.078165
City_San Bruno -0.075458
City_Albany -0.073917
City_Baldwin Park -0.073611
City_Aptos -0.073277
City_El Dorado Hills -0.072933
City_La Mirada -0.072853
City_Glendora -0.071343
City_Larkspur -0.070541
Experience -0.068624
City_San Gabriel -0.065439
City_Hollister -0.064162
City_Trinity Center -0.064063
City_Bodega Bay -0.063711
City_Belvedere Tiburon -0.061032
City_March Air Reserve Base -0.060479
City_Rio Vista -0.059178
City_Concord -0.057318
City_Cupertino -0.056513
City_Westlake Village -0.055226
City_Gilroy -0.053535
City_Pleasant Hill -0.052104
City_Claremont -0.044581
City_Long Beach -0.038953
City_Yorba Linda -0.036893
City_Crestline -0.036679
City_South Gate -0.036029
City_Mission Hills -0.035818
City_Daly City -0.031724
City_Brea -0.031526
City_Inglewood -0.031001
City_Castro Valley -0.028638
City_Folsom -0.026668
City_Santa Rosa -0.026281
City_Carson -0.025603
City_Montague -0.024715
City_San Rafael -0.022831
City_Saratoga -0.022165
City_Clovis -0.019849
City_Danville -0.019296
City_Belmont -0.018305
City_Stockton -0.017588
City_Clearlake -0.017547
City_Weed -0.017243
City_Hawthorne -0.015285
City_Hacienda Heights -0.012823
City_El Segundo -0.012718
City_Moraga -0.011544
City_Imperial -0.011336
City_Rudno nad Hronom -0.011115
City_San Pablo -0.010608
City_San Dimas -0.008629
City_Sylmar -0.008480
City_Ben Lomond -0.008379
State_CA -0.006848
City_Signal Hill -0.006497
City_Yucaipa -0.006452
City_Escondido -0.005825
City_Morgan Hill -0.005472
City_Napa -0.005363
City_Vista -0.004620
City_Encinitas -0.004001
City_Upland -0.002961
City_North Hills -0.002844
City_Alamo -0.002776
City_Pacific Grove -0.002188
City_Edwards -0.001852
City_South Pasadena -0.001376
City_Garden Grove -0.001229
City_Rancho Palos Verdes -0.000939
City_Chino -0.000763
City_Lompoc -0.000517
City_Oxnard -0.000512
City_Pacific Palisades -0.000317
City_Sausalito -0.000302
City_Tahoe City -0.000186
City_Half Moon Bay -0.000064

Observation Negative coefficients:

The above table shows all variables with Negative cefficients - i.e With value increase the propability of the customer to accept a personal loan decreases.

The top 5 variables affecting the target variable are:

  1. City_Livermore
  2. CreditCard
  3. Securities_Account
  4. City_Davis
  5. City_Carlsbad

Odds Coefficients interpretation

In [ ]:
# converting coefficients to odds
odds = np.exp(lg.coef_[0])

# finding the percentage change
perc_change_odds = (np.exp(lg.coef_[0]) - 1) * 100

# removing limit from number of columns to display
pd.set_option("display.max_columns", None)

# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train.columns).sort_values(by='Odds',ascending=False)
Out[ ]:
Odds Change_odd%
Education_3 40.874366 3987.436574
Education_2 32.203258 3120.325806
CD_Account 23.098949 2209.894917
City_Los Gatos 3.369340 236.934039
City_Whittier 2.499195 149.919451
... ... ...
City_Manhattan Beach 0.443823 -55.617718
CreditCard 0.435514 -56.448568
City_Carlsbad 0.419378 -58.062174
City_Tustin 0.418629 -58.137092
City_Livermore 0.412552 -58.744819

257 rows × 2 columns

Overall Observation

A. The top 5 variables affecting the target variable positively are:

  1. Education_3 : increases the customer odds of accepting a personal loan by 30 times
  2. Education_2 : increases the customer odds of accepting a personal loan by 26 times
  3. CD_Account : increases the customer odds of accepting a personal loan by 25 times
  4. City_Los Gatos : increases the customer odds of accepting a personal loan by 3.5 times
  5. City_Martinez : increases the customer odds of accepting a personal loan by 2.5 times

B. The top 5 variables affecting the target variable Negatively are:

  1. City_Livermore : decreases the customer odds of accepting a personal loan by 0.44 times
  2. CreditCard : decreases the customer odds of accepting a personal loan by 0.44 times
  3. Securities_Account : decreases the customer odds of accepting a personal loan by 0.42 times
  4. City_Davis : decreases the customer odds of accepting a personal loan by 0.44 times
  5. City_Carlsbad : decreases the customer odds of accepting a personal loan by 0.44 times

Model Performance Evaluation - Logistic Regression¶

Checking model performance on training set, threshold = 0.5

In [ ]:
# creating confusion matrix
confusion_matrix_sklearn_with_threshold(lg, X_train, y_train)
No description has been provided for this image

Checking performance on test set

In [ ]:
log_reg_model_train_perf = model_performance_classification_sklearn_with_threshold(
    lg, X_train, y_train
)

print("Training Set performance:")
log_reg_model_train_perf
Training Set performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.96325 0.689474 0.900344 0.780924
In [ ]:
# creating confusion matrix
confusion_matrix_sklearn_with_threshold(lg, X_test, y_test)
No description has been provided for this image
In [ ]:
log_reg_model_test_perf = model_performance_classification_sklearn_with_threshold(
    lg, X_test, y_test
)

print("Test set performance:")
log_reg_model_test_perf
Test set performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.954 0.64 0.864865 0.735632

Observation:

The Recall on the training and test sets are close in value 62% and 68% respectively, yet the model performance needs to be improved as viewed from the confusion matrix as well:

Test Performance of 6.20% which is less than than 9% positive class of the dataset

Model Performance Improvement¶

I will like to use 0.10 threshold is to Checking model performance on training and test data tests

In [ ]:
# creating confusion matrix for train set
confusion_matrix_sklearn_with_threshold(
    lg, X_train, y_train, threshold=0.10
)
No description has been provided for this image
In [ ]:
# checking model performance for this model
log_reg_model_train_perf_threshold_curve = model_performance_classification_sklearn_with_threshold(
    lg, X_train, y_train, threshold=0.10
)
print("Training performance with deduced threshhold from Precision - Recall curve is:")
log_reg_model_train_perf_threshold_curve
Training performance with deduced threshhold from Precision - Recall curve is:
Out[ ]:
Accuracy Recall Precision F1
0 0.9045 0.913158 0.498563 0.644981

Checking model performance on test set

In [ ]:
# creating confusion matrix
confusion_matrix_sklearn_with_threshold(
    lg, X_test, y_test, threshold=0.10
)
No description has been provided for this image
In [ ]:
# checking model performance for this model
log_reg_model_test_perf_threshold_curve = model_performance_classification_sklearn_with_threshold(
    lg, X_test, y_test, threshold=0.10
)
print("Test performance with deduced threshhold from the Precision - Recall curve is::")
log_reg_model_test_perf_threshold_curve
Test performance with deduced threshhold from the Precision - Recall curve is::
Out[ ]:
Accuracy Recall Precision F1
0 0.902 0.87 0.505814 0.639706

Observation and conclusion:

At threshold 0.10 the Recall for the test and train data sets is equal to 87% and 91% respectively which looks good performance by the logistics regression model to minimize the FN on our model to only 1.20% on the test set while still maintaining a precision value of approximately 50% for both test and train datasets.

Let us apply sequential feature selector¶

In [ ]:
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

# to plot the performance with addition of each feature
from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs
In [ ]:
# from sklearn.linear_model import LogisticRegression

# Fit the model on train
model = LogisticRegression(solver="newton-cg", n_jobs=-1, random_state=1, max_iter=100)
In [ ]:
X_train.shape
Out[ ]:
(4000, 257)
In [ ]:
# we will first build model with all varaible
sfs = SFS(
    model,
    k_features=257,
    forward=True,
    floating=False,
    scoring="f1",
    verbose=2,
    cv=2,
    n_jobs=-1,
)

sfs = sfs.fit(X_train, y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    2.1s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.4s
[Parallel(n_jobs=-1)]: Done 257 out of 257 | elapsed:    6.7s finished

[2025-11-17 04:27:24] Features: 1/257 -- score: 0.4104808235243018[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 256 out of 256 | elapsed:    4.0s finished

[2025-11-17 04:27:28] Features: 2/257 -- score: 0.5389730931641541[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 240 out of 255 | elapsed:    3.7s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done 255 out of 255 | elapsed:    4.0s finished

[2025-11-17 04:27:32] Features: 3/257 -- score: 0.5878732445359096[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 254 out of 254 | elapsed:    4.0s finished

[2025-11-17 04:27:36] Features: 4/257 -- score: 0.7075021174604891[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 238 out of 253 | elapsed:    4.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 253 out of 253 | elapsed:    4.3s finished

[2025-11-17 04:27:40] Features: 5/257 -- score: 0.7233252563594652[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 237 out of 252 | elapsed:    4.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 252 out of 252 | elapsed:    4.2s finished

[2025-11-17 04:27:45] Features: 6/257 -- score: 0.7376643013945693[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 236 out of 251 | elapsed:    4.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 251 out of 251 | elapsed:    4.2s finished

[2025-11-17 04:27:49] Features: 7/257 -- score: 0.7416059055296409[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:    4.4s finished

[2025-11-17 04:27:53] Features: 8/257 -- score: 0.7453903774128494[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 234 out of 249 | elapsed:    4.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 249 out of 249 | elapsed:    4.2s finished

[2025-11-17 04:27:57] Features: 9/257 -- score: 0.7481993661768942[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 233 out of 248 | elapsed:    4.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 248 out of 248 | elapsed:    4.2s finished

[2025-11-17 04:28:01] Features: 10/257 -- score: 0.7492596605272661[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 232 out of 247 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 247 out of 247 | elapsed:    4.4s finished

[2025-11-17 04:28:06] Features: 11/257 -- score: 0.7512667056653017[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 246 out of 246 | elapsed:    4.2s finished

[2025-11-17 04:28:10] Features: 12/257 -- score: 0.7530154000789748[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 230 out of 245 | elapsed:    4.1s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 245 out of 245 | elapsed:    4.2s finished

[2025-11-17 04:28:14] Features: 13/257 -- score: 0.7547542978852884[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 244 out of 244 | elapsed:    4.5s finished

[2025-11-17 04:28:19] Features: 14/257 -- score: 0.75674855929633[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 228 out of 243 | elapsed:    4.1s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 243 out of 243 | elapsed:    4.3s finished

[2025-11-17 04:28:23] Features: 15/257 -- score: 0.757411777229743[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 242 out of 242 | elapsed:    4.3s finished

[2025-11-17 04:28:27] Features: 16/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 226 out of 241 | elapsed:    4.2s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 241 out of 241 | elapsed:    4.4s finished

[2025-11-17 04:28:32] Features: 17/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed:    4.3s finished

[2025-11-17 04:28:36] Features: 18/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 224 out of 239 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 239 out of 239 | elapsed:    4.7s finished

[2025-11-17 04:28:41] Features: 19/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 223 out of 238 | elapsed:    4.1s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 238 out of 238 | elapsed:    4.3s finished

[2025-11-17 04:28:45] Features: 20/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 222 out of 237 | elapsed:    4.2s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 237 out of 237 | elapsed:    4.5s finished

[2025-11-17 04:28:50] Features: 21/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 221 out of 236 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 236 out of 236 | elapsed:    4.7s finished

[2025-11-17 04:28:54] Features: 22/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 235 out of 235 | elapsed:    4.5s finished

[2025-11-17 04:28:59] Features: 23/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 219 out of 234 | elapsed:    4.1s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 234 out of 234 | elapsed:    4.4s finished

[2025-11-17 04:29:03] Features: 24/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 218 out of 233 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 233 out of 233 | elapsed:    4.6s finished

[2025-11-17 04:29:08] Features: 25/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 232 out of 232 | elapsed:    4.4s finished

[2025-11-17 04:29:12] Features: 26/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 216 out of 231 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 231 out of 231 | elapsed:    4.7s finished

[2025-11-17 04:29:17] Features: 27/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 230 out of 230 | elapsed:    4.3s finished

[2025-11-17 04:29:21] Features: 28/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 214 out of 229 | elapsed:    4.2s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 229 out of 229 | elapsed:    4.4s finished

[2025-11-17 04:29:26] Features: 29/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 228 out of 228 | elapsed:    4.6s finished

[2025-11-17 04:29:30] Features: 30/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 212 out of 227 | elapsed:    4.2s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 227 out of 227 | elapsed:    4.4s finished

[2025-11-17 04:29:35] Features: 31/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 226 out of 226 | elapsed:    4.3s finished

[2025-11-17 04:29:39] Features: 32/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 210 out of 225 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 225 out of 225 | elapsed:    4.6s finished

[2025-11-17 04:29:44] Features: 33/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 224 out of 224 | elapsed:    4.4s finished

[2025-11-17 04:29:48] Features: 34/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 208 out of 223 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 223 out of 223 | elapsed:    4.7s finished

[2025-11-17 04:29:53] Features: 35/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 222 out of 222 | elapsed:    4.4s finished

[2025-11-17 04:29:57] Features: 36/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 206 out of 221 | elapsed:    4.2s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 221 out of 221 | elapsed:    4.4s finished

[2025-11-17 04:30:02] Features: 37/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 220 out of 220 | elapsed:    4.7s finished

[2025-11-17 04:30:06] Features: 38/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 204 out of 219 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 219 out of 219 | elapsed:    4.5s finished

[2025-11-17 04:30:11] Features: 39/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 203 out of 218 | elapsed:    4.1s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 218 out of 218 | elapsed:    4.4s finished

[2025-11-17 04:30:15] Features: 40/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 217 out of 217 | elapsed:    4.6s finished

[2025-11-17 04:30:20] Features: 41/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 216 out of 216 | elapsed:    4.5s finished

[2025-11-17 04:30:24] Features: 42/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 200 out of 215 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 215 out of 215 | elapsed:    4.7s finished

[2025-11-17 04:30:29] Features: 43/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 214 out of 214 | elapsed:    4.4s finished

[2025-11-17 04:30:34] Features: 44/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 198 out of 213 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 213 out of 213 | elapsed:    4.5s finished

[2025-11-17 04:30:38] Features: 45/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 212 out of 212 | elapsed:    4.7s finished

[2025-11-17 04:30:43] Features: 46/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 196 out of 211 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 211 out of 211 | elapsed:    4.5s finished

[2025-11-17 04:30:47] Features: 47/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 210 out of 210 | elapsed:    4.5s finished

[2025-11-17 04:30:52] Features: 48/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 194 out of 209 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 209 out of 209 | elapsed:    4.6s finished

[2025-11-17 04:30:57] Features: 49/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 208 out of 208 | elapsed:    4.5s finished

[2025-11-17 04:31:01] Features: 50/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 192 out of 207 | elapsed:    4.6s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 207 out of 207 | elapsed:    4.9s finished

[2025-11-17 04:31:06] Features: 51/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 191 out of 206 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 206 out of 206 | elapsed:    4.5s finished

[2025-11-17 04:31:11] Features: 52/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 190 out of 205 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 205 out of 205 | elapsed:    4.6s finished

[2025-11-17 04:31:15] Features: 53/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 204 out of 204 | elapsed:    4.9s finished

[2025-11-17 04:31:20] Features: 54/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 203 out of 203 | elapsed:    4.7s finished

[2025-11-17 04:31:25] Features: 55/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 187 out of 202 | elapsed:    4.4s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 202 out of 202 | elapsed:    4.7s finished

[2025-11-17 04:31:29] Features: 56/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 201 out of 201 | elapsed:    4.6s finished

[2025-11-17 04:31:34] Features: 57/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed:    4.7s finished

[2025-11-17 04:31:39] Features: 58/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 184 out of 199 | elapsed:    4.7s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 199 out of 199 | elapsed:    5.0s finished

[2025-11-17 04:31:44] Features: 59/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 198 out of 198 | elapsed:    4.6s finished

[2025-11-17 04:31:49] Features: 60/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 182 out of 197 | elapsed:    4.6s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 197 out of 197 | elapsed:    4.9s finished

[2025-11-17 04:31:53] Features: 61/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 196 out of 196 | elapsed:    4.7s finished

[2025-11-17 04:31:58] Features: 62/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 180 out of 195 | elapsed:    4.5s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 195 out of 195 | elapsed:    4.7s finished

[2025-11-17 04:32:03] Features: 63/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 194 out of 194 | elapsed:    4.9s finished

[2025-11-17 04:32:08] Features: 64/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 178 out of 193 | elapsed:    4.5s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 193 out of 193 | elapsed:    4.7s finished

[2025-11-17 04:32:13] Features: 65/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 192 out of 192 | elapsed:    4.9s finished

[2025-11-17 04:32:17] Features: 66/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 191 out of 191 | elapsed:    4.9s finished

[2025-11-17 04:32:22] Features: 67/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 175 out of 190 | elapsed:    4.4s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 190 out of 190 | elapsed:    4.8s finished

[2025-11-17 04:32:27] Features: 68/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 174 out of 189 | elapsed:    4.7s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 189 out of 189 | elapsed:    5.1s finished

[2025-11-17 04:32:32] Features: 69/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 188 out of 188 | elapsed:    4.8s finished

[2025-11-17 04:32:37] Features: 70/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 187 out of 187 | elapsed:    5.1s finished

[2025-11-17 04:32:42] Features: 71/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 171 out of 186 | elapsed:    4.5s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 186 out of 186 | elapsed:    4.8s finished

[2025-11-17 04:32:47] Features: 72/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 170 out of 185 | elapsed:    4.6s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 185 out of 185 | elapsed:    4.8s finished

[2025-11-17 04:32:52] Features: 73/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 169 out of 184 | elapsed:    4.8s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 184 out of 184 | elapsed:    5.1s finished

[2025-11-17 04:32:57] Features: 74/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.7s
[Parallel(n_jobs=-1)]: Done 183 out of 183 | elapsed:    5.8s finished

[2025-11-17 04:33:03] Features: 75/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.9s
[Parallel(n_jobs=-1)]: Done 182 out of 182 | elapsed:    6.0s finished

[2025-11-17 04:33:09] Features: 76/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 166 out of 181 | elapsed:    4.6s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 181 out of 181 | elapsed:    4.8s finished

[2025-11-17 04:33:14] Features: 77/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.9s
[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:    5.9s finished

[2025-11-17 04:33:19] Features: 78/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.8s
[Parallel(n_jobs=-1)]: Done 179 out of 179 | elapsed:    5.7s finished

[2025-11-17 04:33:25] Features: 79/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 178 out of 178 | elapsed:    4.9s finished

[2025-11-17 04:33:30] Features: 80/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.8s
[Parallel(n_jobs=-1)]: Done 177 out of 177 | elapsed:    5.8s finished

[2025-11-17 04:33:36] Features: 81/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 176 out of 176 | elapsed:    4.8s finished

[2025-11-17 04:33:41] Features: 82/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    5.0s
[Parallel(n_jobs=-1)]: Done 175 out of 175 | elapsed:    5.8s finished

[2025-11-17 04:33:47] Features: 83/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.7s
[Parallel(n_jobs=-1)]: Done 174 out of 174 | elapsed:    5.6s finished

[2025-11-17 04:33:52] Features: 84/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    5.0s
[Parallel(n_jobs=-1)]: Done 173 out of 173 | elapsed:    5.8s finished

[2025-11-17 04:33:58] Features: 85/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 157 out of 172 | elapsed:    4.5s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 172 out of 172 | elapsed:    4.8s finished

[2025-11-17 04:34:03] Features: 86/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    5.1s
[Parallel(n_jobs=-1)]: Done 171 out of 171 | elapsed:    5.8s finished

[2025-11-17 04:34:09] Features: 87/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 170 out of 170 | elapsed:    4.8s finished

[2025-11-17 04:34:13] Features: 88/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    5.0s
[Parallel(n_jobs=-1)]: Done 169 out of 169 | elapsed:    5.8s finished

[2025-11-17 04:34:19] Features: 89/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 153 out of 168 | elapsed:    4.5s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 168 out of 168 | elapsed:    4.8s finished

[2025-11-17 04:34:24] Features: 90/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    5.0s
[Parallel(n_jobs=-1)]: Done 167 out of 167 | elapsed:    5.7s finished

[2025-11-17 04:34:30] Features: 91/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.9s
[Parallel(n_jobs=-1)]: Done 166 out of 166 | elapsed:    5.5s finished

[2025-11-17 04:34:35] Features: 92/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 150 out of 165 | elapsed:    4.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 165 out of 165 | elapsed:    4.8s finished

[2025-11-17 04:34:40] Features: 93/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    5.2s
[Parallel(n_jobs=-1)]: Done 164 out of 164 | elapsed:    5.7s finished

[2025-11-17 04:34:46] Features: 94/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    5.0s
[Parallel(n_jobs=-1)]: Done 163 out of 163 | elapsed:    5.5s finished

[2025-11-17 04:34:51] Features: 95/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done 162 out of 162 | elapsed:    5.0s finished

[2025-11-17 04:34:56] Features: 96/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    5.1s
[Parallel(n_jobs=-1)]: Done 161 out of 161 | elapsed:    5.5s finished

[2025-11-17 04:35:02] Features: 97/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed:    5.7s finished

[2025-11-17 04:35:07] Features: 98/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 159 out of 159 | elapsed:    5.5s finished

[2025-11-17 04:35:13] Features: 99/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 158 out of 158 | elapsed:    5.4s finished

[2025-11-17 04:35:18] Features: 100/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 157 out of 157 | elapsed:    5.5s finished

[2025-11-17 04:35:24] Features: 101/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 156 out of 156 | elapsed:    5.4s finished

[2025-11-17 04:35:29] Features: 102/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 155 out of 155 | elapsed:    5.6s finished

[2025-11-17 04:35:35] Features: 103/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 154 out of 154 | elapsed:    5.3s finished

[2025-11-17 04:35:40] Features: 104/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 153 out of 153 | elapsed:    5.6s finished

[2025-11-17 04:35:46] Features: 105/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 152 out of 152 | elapsed:    5.3s finished

[2025-11-17 04:35:51] Features: 106/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 151 out of 151 | elapsed:    5.6s finished

[2025-11-17 04:35:57] Features: 107/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:    5.2s finished

[2025-11-17 04:36:02] Features: 108/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 149 out of 149 | elapsed:    5.4s finished

[2025-11-17 04:36:07] Features: 109/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 148 out of 148 | elapsed:    5.3s finished

[2025-11-17 04:36:13] Features: 110/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 147 out of 147 | elapsed:    5.3s finished

[2025-11-17 04:36:18] Features: 111/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 146 out of 146 | elapsed:    5.4s finished

[2025-11-17 04:36:23] Features: 112/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 145 out of 145 | elapsed:    5.2s finished

[2025-11-17 04:36:29] Features: 113/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 144 out of 144 | elapsed:    5.4s finished

[2025-11-17 04:36:34] Features: 114/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 143 out of 143 | elapsed:    5.2s finished

[2025-11-17 04:36:39] Features: 115/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 142 out of 142 | elapsed:    5.2s finished

[2025-11-17 04:36:44] Features: 116/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 141 out of 141 | elapsed:    5.0s finished

[2025-11-17 04:36:49] Features: 117/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 140 out of 140 | elapsed:    5.0s finished

[2025-11-17 04:36:54] Features: 118/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 139 out of 139 | elapsed:    5.2s finished

[2025-11-17 04:37:00] Features: 119/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 138 out of 138 | elapsed:    5.0s finished

[2025-11-17 04:37:05] Features: 120/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 137 out of 137 | elapsed:    5.2s finished

[2025-11-17 04:37:10] Features: 121/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done 136 out of 136 | elapsed:    4.4s finished

[2025-11-17 04:37:14] Features: 122/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 135 out of 135 | elapsed:    5.1s finished

[2025-11-17 04:37:19] Features: 123/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 134 out of 134 | elapsed:    5.0s finished

[2025-11-17 04:37:24] Features: 124/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 133 out of 133 | elapsed:    4.9s finished

[2025-11-17 04:37:29] Features: 125/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 132 out of 132 | elapsed:    5.1s finished

[2025-11-17 04:37:34] Features: 126/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 131 out of 131 | elapsed:    4.9s finished

[2025-11-17 04:37:39] Features: 127/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 130 out of 130 | elapsed:    5.1s finished

[2025-11-17 04:37:44] Features: 128/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 129 out of 129 | elapsed:    4.9s finished

[2025-11-17 04:37:49] Features: 129/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 128 out of 128 | elapsed:    4.9s finished

[2025-11-17 04:37:54] Features: 130/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 127 out of 127 | elapsed:    5.1s finished

[2025-11-17 04:37:59] Features: 131/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 126 out of 126 | elapsed:    4.8s finished

[2025-11-17 04:38:04] Features: 132/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 125 out of 125 | elapsed:    5.0s finished

[2025-11-17 04:38:09] Features: 133/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 124 out of 124 | elapsed:    4.7s finished

[2025-11-17 04:38:14] Features: 134/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 123 out of 123 | elapsed:    4.7s finished

[2025-11-17 04:38:18] Features: 135/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 122 out of 122 | elapsed:    4.9s finished

[2025-11-17 04:38:23] Features: 136/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 121 out of 121 | elapsed:    4.7s finished

[2025-11-17 04:38:28] Features: 137/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed:    4.9s finished

[2025-11-17 04:38:33] Features: 138/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 119 out of 119 | elapsed:    4.8s finished

[2025-11-17 04:38:38] Features: 139/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 118 out of 118 | elapsed:    4.7s finished

[2025-11-17 04:38:42] Features: 140/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 117 out of 117 | elapsed:    4.9s finished

[2025-11-17 04:38:47] Features: 141/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 116 out of 116 | elapsed:    4.6s finished

[2025-11-17 04:38:52] Features: 142/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 115 out of 115 | elapsed:    4.8s finished

[2025-11-17 04:38:57] Features: 143/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 114 out of 114 | elapsed:    4.7s finished

[2025-11-17 04:39:01] Features: 144/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 113 out of 113 | elapsed:    4.6s finished

[2025-11-17 04:39:06] Features: 145/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 112 out of 112 | elapsed:    4.8s finished

[2025-11-17 04:39:11] Features: 146/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 111 out of 111 | elapsed:    4.6s finished

[2025-11-17 04:39:15] Features: 147/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 110 out of 110 | elapsed:    4.4s finished

[2025-11-17 04:39:20] Features: 148/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 109 out of 109 | elapsed:    4.8s finished

[2025-11-17 04:39:25] Features: 149/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed:    4.5s finished

[2025-11-17 04:39:29] Features: 150/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 107 out of 107 | elapsed:    4.6s finished

[2025-11-17 04:39:34] Features: 151/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 106 out of 106 | elapsed:    4.4s finished

[2025-11-17 04:39:38] Features: 152/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 105 out of 105 | elapsed:    4.4s finished

[2025-11-17 04:39:42] Features: 153/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 104 out of 104 | elapsed:    4.6s finished

[2025-11-17 04:39:47] Features: 154/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 103 out of 103 | elapsed:    4.4s finished

[2025-11-17 04:39:51] Features: 155/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 102 out of 102 | elapsed:    4.2s finished

[2025-11-17 04:39:56] Features: 156/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 101 out of 101 | elapsed:    4.5s finished

[2025-11-17 04:40:00] Features: 157/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    4.2s finished

[2025-11-17 04:40:04] Features: 158/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  99 out of  99 | elapsed:    4.2s finished

[2025-11-17 04:40:09] Features: 159/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  98 out of  98 | elapsed:    4.3s finished

[2025-11-17 04:40:13] Features: 160/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  97 out of  97 | elapsed:    4.2s finished

[2025-11-17 04:40:17] Features: 161/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  96 out of  96 | elapsed:    4.2s finished

[2025-11-17 04:40:21] Features: 162/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  95 out of  95 | elapsed:    4.3s finished

[2025-11-17 04:40:26] Features: 163/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  94 out of  94 | elapsed:    4.0s finished

[2025-11-17 04:40:30] Features: 164/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  93 out of  93 | elapsed:    4.2s finished

[2025-11-17 04:40:34] Features: 165/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  92 out of  92 | elapsed:    4.1s finished

[2025-11-17 04:40:38] Features: 166/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  91 out of  91 | elapsed:    4.1s finished

[2025-11-17 04:40:42] Features: 167/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:    4.0s finished

[2025-11-17 04:40:46] Features: 168/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  89 out of  89 | elapsed:    4.0s finished

[2025-11-17 04:40:50] Features: 169/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  88 out of  88 | elapsed:    4.0s finished

[2025-11-17 04:40:54] Features: 170/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  87 out of  87 | elapsed:    4.1s finished

[2025-11-17 04:40:58] Features: 171/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  86 out of  86 | elapsed:    3.9s finished

[2025-11-17 04:41:02] Features: 172/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  85 out of  85 | elapsed:    3.9s finished

[2025-11-17 04:41:06] Features: 173/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  84 out of  84 | elapsed:    3.9s finished

[2025-11-17 04:41:10] Features: 174/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  83 out of  83 | elapsed:    3.9s finished

[2025-11-17 04:41:14] Features: 175/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  82 out of  82 | elapsed:    3.7s finished

[2025-11-17 04:41:17] Features: 176/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  81 out of  81 | elapsed:    3.8s finished

[2025-11-17 04:41:21] Features: 177/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  80 out of  80 | elapsed:    3.9s finished

[2025-11-17 04:41:25] Features: 178/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  79 out of  79 | elapsed:    3.6s finished

[2025-11-17 04:41:29] Features: 179/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  78 out of  78 | elapsed:    3.6s finished

[2025-11-17 04:41:32] Features: 180/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  77 out of  77 | elapsed:    3.9s finished

[2025-11-17 04:41:36] Features: 181/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  76 out of  76 | elapsed:    3.6s finished

[2025-11-17 04:41:40] Features: 182/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  75 out of  75 | elapsed:    3.5s finished

[2025-11-17 04:41:43] Features: 183/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  74 out of  74 | elapsed:    3.6s finished

[2025-11-17 04:41:47] Features: 184/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  73 out of  73 | elapsed:    3.5s finished

[2025-11-17 04:41:51] Features: 185/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  72 out of  72 | elapsed:    3.4s finished

[2025-11-17 04:41:54] Features: 186/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  71 out of  71 | elapsed:    3.5s finished

[2025-11-17 04:41:57] Features: 187/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  70 out of  70 | elapsed:    3.6s finished

[2025-11-17 04:42:01] Features: 188/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  69 out of  69 | elapsed:    3.4s finished

[2025-11-17 04:42:04] Features: 189/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  68 out of  68 | elapsed:    3.4s finished

[2025-11-17 04:42:08] Features: 190/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  67 out of  67 | elapsed:    3.6s finished

[2025-11-17 04:42:11] Features: 191/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  66 out of  66 | elapsed:    3.3s finished

[2025-11-17 04:42:15] Features: 192/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  65 out of  65 | elapsed:    3.3s finished

[2025-11-17 04:42:18] Features: 193/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  64 out of  64 | elapsed:    3.1s finished

[2025-11-17 04:42:21] Features: 194/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  63 out of  63 | elapsed:    3.5s finished

[2025-11-17 04:42:25] Features: 195/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  62 out of  62 | elapsed:    3.0s finished

[2025-11-17 04:42:28] Features: 196/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  61 out of  61 | elapsed:    3.0s finished

[2025-11-17 04:42:31] Features: 197/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:    3.1s finished

[2025-11-17 04:42:34] Features: 198/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  59 out of  59 | elapsed:    3.2s finished

[2025-11-17 04:42:37] Features: 199/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  58 out of  58 | elapsed:    2.9s finished

[2025-11-17 04:42:40] Features: 200/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  57 out of  57 | elapsed:    2.9s finished

[2025-11-17 04:42:43] Features: 201/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  56 out of  56 | elapsed:    2.8s finished

[2025-11-17 04:42:46] Features: 202/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  55 out of  55 | elapsed:    3.1s finished

[2025-11-17 04:42:49] Features: 203/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:    2.8s finished

[2025-11-17 04:42:51] Features: 204/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  53 out of  53 | elapsed:    2.7s finished

[2025-11-17 04:42:54] Features: 205/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  52 out of  52 | elapsed:    2.8s finished

[2025-11-17 04:42:57] Features: 206/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  51 out of  51 | elapsed:    2.9s finished

[2025-11-17 04:43:00] Features: 207/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:    2.9s finished

[2025-11-17 04:43:03] Features: 208/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  49 out of  49 | elapsed:    2.7s finished

[2025-11-17 04:43:05] Features: 209/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  48 out of  48 | elapsed:    2.7s finished

[2025-11-17 04:43:08] Features: 210/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  47 out of  47 | elapsed:    2.6s finished

[2025-11-17 04:43:11] Features: 211/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  46 out of  46 | elapsed:    2.7s finished

[2025-11-17 04:43:13] Features: 212/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:    2.4s finished

[2025-11-17 04:43:16] Features: 213/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  44 out of  44 | elapsed:    2.4s finished

[2025-11-17 04:43:18] Features: 214/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  43 out of  43 | elapsed:    2.4s finished

[2025-11-17 04:43:21] Features: 215/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  42 out of  42 | elapsed:    2.4s finished

[2025-11-17 04:43:23] Features: 216/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  41 out of  41 | elapsed:    2.6s finished

[2025-11-17 04:43:26] Features: 217/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    2.2s finished

[2025-11-17 04:43:28] Features: 218/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  39 out of  39 | elapsed:    2.2s finished

[2025-11-17 04:43:30] Features: 219/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  38 out of  38 | elapsed:    2.0s finished

[2025-11-17 04:43:32] Features: 220/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 out of  37 | elapsed:    2.1s finished

[2025-11-17 04:43:34] Features: 221/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:    2.2s finished

[2025-11-17 04:43:36] Features: 222/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  35 out of  35 | elapsed:    2.0s finished

[2025-11-17 04:43:38] Features: 223/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 out of  34 | elapsed:    1.9s finished

[2025-11-17 04:43:40] Features: 224/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 out of  33 | elapsed:    1.9s finished

[2025-11-17 04:43:42] Features: 225/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  32 out of  32 | elapsed:    1.9s finished

[2025-11-17 04:43:44] Features: 226/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  31 out of  31 | elapsed:    1.8s finished

[2025-11-17 04:43:46] Features: 227/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:    1.8s finished

[2025-11-17 04:43:48] Features: 228/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  29 out of  29 | elapsed:    1.8s finished

[2025-11-17 04:43:49] Features: 229/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  28 out of  28 | elapsed:    1.7s finished

[2025-11-17 04:43:51] Features: 230/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    1.7s finished

[2025-11-17 04:43:53] Features: 231/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 out of  26 | elapsed:    1.5s finished

[2025-11-17 04:43:54] Features: 232/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  23 out of  25 | elapsed:    1.4s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done  25 out of  25 | elapsed:    1.4s finished

[2025-11-17 04:43:56] Features: 233/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  22 out of  24 | elapsed:    1.4s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done  24 out of  24 | elapsed:    1.4s finished

[2025-11-17 04:43:57] Features: 234/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  20 out of  23 | elapsed:    1.3s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done  23 out of  23 | elapsed:    1.4s finished

[2025-11-17 04:43:59] Features: 235/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  19 out of  22 | elapsed:    1.4s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done  22 out of  22 | elapsed:    1.5s finished

[2025-11-17 04:44:00] Features: 236/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 out of  21 | elapsed:    1.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  21 out of  21 | elapsed:    1.5s finished

[2025-11-17 04:44:02] Features: 237/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  16 out of  20 | elapsed:    1.2s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:    1.3s finished

[2025-11-17 04:44:03] Features: 238/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  14 out of  19 | elapsed:    1.1s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  19 out of  19 | elapsed:    1.3s finished

[2025-11-17 04:44:04] Features: 239/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  13 out of  18 | elapsed:    1.1s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  18 out of  18 | elapsed:    1.2s finished

[2025-11-17 04:44:05] Features: 240/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  11 out of  17 | elapsed:    1.1s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    1.3s finished

[2025-11-17 04:44:07] Features: 241/257 -- score: 0.7580502809406049[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 out of  16 | elapsed:    0.9s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  16 out of  16 | elapsed:    1.1s finished

[2025-11-17 04:44:08] Features: 242/257 -- score: 0.7580502809406049[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 out of  15 | elapsed:    0.7s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:    1.1s finished

[2025-11-17 04:44:09] Features: 243/257 -- score: 0.7573789794645132[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   7 out of  14 | elapsed:    0.7s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done  14 out of  14 | elapsed:    1.0s finished

[2025-11-17 04:44:10] Features: 244/257 -- score: 0.7561885741314973[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of  13 | elapsed:    0.6s remaining:    0.9s
[Parallel(n_jobs=-1)]: Done  13 out of  13 | elapsed:    1.0s finished

[2025-11-17 04:44:11] Features: 245/257 -- score: 0.7542244053320211[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of  12 | elapsed:    0.7s remaining:    1.4s
[Parallel(n_jobs=-1)]: Done  12 out of  12 | elapsed:    1.0s finished

[2025-11-17 04:44:12] Features: 246/257 -- score: 0.7512345679012346[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of  11 | elapsed:    0.8s remaining:    3.6s
[Parallel(n_jobs=-1)]: Done   8 out of  11 | elapsed:    0.9s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  11 out of  11 | elapsed:    1.2s finished

[2025-11-17 04:44:13] Features: 247/257 -- score: 0.7495013583685821[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.7s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished

[2025-11-17 04:44:14] Features: 248/257 -- score: 0.7492948056415549[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of   9 | elapsed:    0.6s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done   9 out of   9 | elapsed:    0.9s finished

[2025-11-17 04:44:15] Features: 249/257 -- score: 0.7475615961089024[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   8 | elapsed:    0.6s remaining:    0.9s
[Parallel(n_jobs=-1)]: Done   8 out of   8 | elapsed:    0.7s finished

[2025-11-17 04:44:16] Features: 250/257 -- score: 0.7456097856364297[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of   7 | elapsed:    0.6s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done   7 out of   7 | elapsed:    0.7s finished

[2025-11-17 04:44:16] Features: 251/257 -- score: 0.744371784349545[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   6 | elapsed:    0.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done   6 out of   6 | elapsed:    0.6s finished

[2025-11-17 04:44:17] Features: 252/257 -- score: 0.7454421551968281[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:    0.5s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:    0.6s finished

[2025-11-17 04:44:18] Features: 253/257 -- score: 0.7454421551968281[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.4s finished

[2025-11-17 04:44:18] Features: 254/257 -- score: 0.7449224203788098[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    0.5s finished

[2025-11-17 04:44:19] Features: 255/257 -- score: 0.7411843685958589[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:    0.4s finished

[2025-11-17 04:44:19] Features: 256/257 -- score: 0.7394277615377801[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.

[2025-11-17 04:44:19] Features: 257/257 -- score: 0.7283814535627934
In [ ]:
fig1 = plot_sfs(sfs.get_metric_dict(), kind="std_dev", figsize=(40, 5))

plt.title("Sequential Forward Selection (w. StdDev)")
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

Observation:

Its seen seen that at approx. the 18th feature, the performance start to stay constant, hence let us create a new model with 35 variables only and display the top 18 features

In [ ]:
sfs1 = SFS(
    model,
    k_features=35,
    forward=True,
    floating=False,
    scoring="f1",
    verbose=2,
    cv=2,
    n_jobs=-1,
)

sfs1 = sfs1.fit(X_train, y_train)

fig1 = plot_sfs(sfs1.get_metric_dict(), kind="std_dev", figsize=(10, 5))

plt.title("Sequential Forward Selection (w. StdDev)")
plt.grid()
plt.show()
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 242 out of 257 | elapsed:    3.1s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done 257 out of 257 | elapsed:    3.2s finished

[2025-11-17 04:44:24] Features: 1/35 -- score: 0.4104808235243018[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 256 out of 256 | elapsed:    4.0s finished

[2025-11-17 04:44:28] Features: 2/35 -- score: 0.5389730931641541[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 255 out of 255 | elapsed:    3.9s finished

[2025-11-17 04:44:32] Features: 3/35 -- score: 0.5878732445359096[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 239 out of 254 | elapsed:    3.8s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done 254 out of 254 | elapsed:    4.1s finished

[2025-11-17 04:44:36] Features: 4/35 -- score: 0.7075021174604891[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 253 out of 253 | elapsed:    4.3s finished

[2025-11-17 04:44:40] Features: 5/35 -- score: 0.7233252563594652[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 237 out of 252 | elapsed:    4.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 252 out of 252 | elapsed:    4.2s finished

[2025-11-17 04:44:44] Features: 6/35 -- score: 0.7376643013945693[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 236 out of 251 | elapsed:    4.1s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 251 out of 251 | elapsed:    4.3s finished

[2025-11-17 04:44:49] Features: 7/35 -- score: 0.7416059055296409[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 235 out of 250 | elapsed:    3.9s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:    4.1s finished

[2025-11-17 04:44:53] Features: 8/35 -- score: 0.7453903774128494[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 234 out of 249 | elapsed:    4.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 249 out of 249 | elapsed:    4.2s finished

[2025-11-17 04:44:57] Features: 9/35 -- score: 0.7481993661768942[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 248 out of 248 | elapsed:    4.4s finished

[2025-11-17 04:45:01] Features: 10/35 -- score: 0.7492596605272661[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 232 out of 247 | elapsed:    4.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 247 out of 247 | elapsed:    4.2s finished

[2025-11-17 04:45:06] Features: 11/35 -- score: 0.7512667056653017[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 246 out of 246 | elapsed:    4.1s finished

[2025-11-17 04:45:10] Features: 12/35 -- score: 0.7530154000789748[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 230 out of 245 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 245 out of 245 | elapsed:    4.5s finished

[2025-11-17 04:45:14] Features: 13/35 -- score: 0.7547542978852884[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 244 out of 244 | elapsed:    4.2s finished

[2025-11-17 04:45:18] Features: 14/35 -- score: 0.75674855929633[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 228 out of 243 | elapsed:    4.1s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 243 out of 243 | elapsed:    4.3s finished

[2025-11-17 04:45:23] Features: 15/35 -- score: 0.757411777229743[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 242 out of 242 | elapsed:    4.4s finished

[2025-11-17 04:45:27] Features: 16/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 226 out of 241 | elapsed:    4.1s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 241 out of 241 | elapsed:    4.3s finished

[2025-11-17 04:45:31] Features: 17/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed:    4.5s finished

[2025-11-17 04:45:36] Features: 18/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 224 out of 239 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 239 out of 239 | elapsed:    4.6s finished

[2025-11-17 04:45:40] Features: 19/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 238 out of 238 | elapsed:    4.3s finished

[2025-11-17 04:45:45] Features: 20/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 222 out of 237 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 237 out of 237 | elapsed:    4.7s finished

[2025-11-17 04:45:50] Features: 21/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 221 out of 236 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 236 out of 236 | elapsed:    4.5s finished

[2025-11-17 04:45:54] Features: 22/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 220 out of 235 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 235 out of 235 | elapsed:    4.5s finished

[2025-11-17 04:45:59] Features: 23/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 219 out of 234 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 234 out of 234 | elapsed:    4.7s finished

[2025-11-17 04:46:03] Features: 24/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 218 out of 233 | elapsed:    4.2s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 233 out of 233 | elapsed:    4.5s finished

[2025-11-17 04:46:08] Features: 25/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 232 out of 232 | elapsed:    4.6s finished

[2025-11-17 04:46:12] Features: 26/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 216 out of 231 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 231 out of 231 | elapsed:    4.6s finished

[2025-11-17 04:46:17] Features: 27/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 230 out of 230 | elapsed:    4.4s finished

[2025-11-17 04:46:21] Features: 28/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 214 out of 229 | elapsed:    4.4s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 229 out of 229 | elapsed:    4.6s finished

[2025-11-17 04:46:26] Features: 29/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 228 out of 228 | elapsed:    4.5s finished

[2025-11-17 04:46:30] Features: 30/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 212 out of 227 | elapsed:    4.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 227 out of 227 | elapsed:    4.5s finished

[2025-11-17 04:46:35] Features: 31/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 226 out of 226 | elapsed:    5.6s finished

[2025-11-17 04:46:40] Features: 32/35 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 210 out of 225 | elapsed:    5.3s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done 225 out of 225 | elapsed:    5.5s finished

[2025-11-17 04:46:46] Features: 33/35 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 224 out of 224 | elapsed:    4.9s finished

[2025-11-17 04:46:51] Features: 34/35 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 208 out of 223 | elapsed:    4.2s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done 223 out of 223 | elapsed:    4.5s finished

[2025-11-17 04:46:55] Features: 35/35 -- score: 0.7592638427268228
No description has been provided for this image

More perfectly observed! Displaying the most important feature names.

Model Performance Evaluation - Simpified model (variables = 35)¶

In [ ]:
feat_cols = list(sfs1.k_feature_idx_)
print(feat_cols)
X_train.columns[feat_cols]
[2, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 29, 37, 40, 42, 43, 100, 117, 119, 123, 149, 182, 206]
Out[ ]:
Index(['Income', 'Family', 'Securities_Account', 'CD_Account', 'Online',
       'CreditCard', 'Education_2', 'Education_3', 'City_Alameda',
       'City_Alamo', 'City_Albany', 'City_Alhambra', 'City_Anaheim',
       'City_Antioch', 'City_Aptos', 'City_Arcadia', 'City_Arcata',
       'City_Bakersfield', 'City_Baldwin Park', 'City_Banning',
       'City_Bella Vista', 'City_Belmont', 'City_Ben Lomond',
       'City_Beverly Hills', 'City_Camarillo', 'City_Capistrano Beach',
       'City_Cardiff By The Sea', 'City_Carlsbad', 'City_Irvine',
       'City_Los Angeles', 'City_Manhattan Beach', 'City_Menlo Park',
       'City_Oakland', 'City_Sacramento', 'City_Santa Barbara'],
      dtype='object')
In [ ]:
X_train_sfs = X_train[X_train.columns[feat_cols]]

# Creating new x_test with the same variables that we selected for x_train
X_test_sfs = X_test[X_train_sfs.columns]

# X_train_sfs = X_train.columns[X_train.columns[feat_cols]]
# X_test_sfs = X_test[X_train_sfs.columns]
print(f'''X_train shape:{X_train_sfs.shape}
Y_test shape:{X_test_sfs.shape}''')
X_train shape:(4000, 35)
Y_test shape:(1000, 35)
In [ ]:
X_train_sfs
Out[ ]:
Income Family Securities_Account CD_Account Online CreditCard Education_2 Education_3 City_Alameda City_Alamo City_Albany City_Alhambra City_Anaheim City_Antioch City_Aptos City_Arcadia City_Arcata City_Bakersfield City_Baldwin Park City_Banning City_Bella Vista City_Belmont City_Ben Lomond City_Beverly Hills City_Camarillo City_Capistrano Beach City_Cardiff By The Sea City_Carlsbad City_Irvine City_Los Angeles City_Manhattan Beach City_Menlo Park City_Oakland City_Sacramento City_Santa Barbara
1233 22 2 0 0 1 0 False False False False False False False False False False False False False False False False False False False False False False False False False False False False False
1056 25 1 0 0 1 0 False True False False False False False False False False False False False False False False False False False False False False False False False False False False False
1686 39 4 1 0 1 0 False False False False False False False False False False False False False False False False False False False False False False False False False False False False False
187 159 3 0 0 1 0 False True False False False False False False False False False False False False False False False False False False False False False False False False False False False
3840 35 3 0 0 0 0 True False False False False False False False False False False False False False False False False False False False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2895 39 4 0 0 1 0 True False True False False False False False False False False False False False False False False False False False False False False False False False False False False
2763 13 4 0 0 1 0 False False False False False False False False False False False False False False False False False False False False False False False False False False False False False
905 28 1 0 0 1 1 False False False False False False False False False False False False False False False False False False False False False False False False False False False False False
3980 89 4 0 0 1 0 True False False False False False False False False False False False False False False False False False False False False False False False False False False False False
235 71 4 0 0 1 0 False True False False False False False False False False False False False False False False False False False False False False False False False False False False False

4000 rows × 35 columns

In [ ]:
# Fitting logistic regession model

log_sfs = LogisticRegression(
    solver="newton-cg", penalty=None, verbose=True, n_jobs=-1, random_state=1
)

# There are several optimizer, we are using optimizer called as 'newton-cg' with max_iter equal to 10000
# max_iter indicates number of iteration needed to converge

log_sfs.fit(X_train_sfs, y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
Out[ ]:
LogisticRegression(n_jobs=-1, penalty=None, random_state=1, solver='newton-cg',
                   verbose=True)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression(n_jobs=-1, penalty=None, random_state=1, solver='newton-cg',
                   verbose=True)

Model performance on training set¶

In [ ]:
confusion_matrix_sklearn_with_threshold(log_sfs, X_train_sfs, y_train)
No description has been provided for this image
In [ ]:
log_reg_model_train_perf_SFS = model_performance_classification_sklearn_with_threshold(
    log_sfs, X_train_sfs, y_train
)
print("Training performance:")
log_reg_model_train_perf_SFS
Training performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.96275 0.7 0.883721 0.781204

Model performance on test set¶

In [ ]:
confusion_matrix_sklearn_with_threshold(log_sfs, X_test_sfs, y_test)
No description has been provided for this image
In [ ]:
log_reg_model_test_perf_SFS = model_performance_classification_sklearn_with_threshold(
    log_sfs, X_test_sfs, y_test
)
print("Test set performance:")
log_reg_model_test_perf_SFS
Test set performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.953 0.66 0.835443 0.73743

The Recall is much less than after applying the feature selection with the default threshold 0.5 , hence we will derive another model with a threshold 0.1 and observe its performance

In [ ]:
log_reg_model_train_perf_SFS_tre = model_performance_classification_sklearn_with_threshold(
    log_sfs, X_train_sfs, y_train,  threshold=0.1
)
print("Training performance:")
log_reg_model_train_perf_SFS_tre
Training performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.9035 0.889474 0.495601 0.636535
In [ ]:
log_reg_model_test_perf_SFS_tre = model_performance_classification_sklearn_with_threshold(
    log_sfs, X_test_sfs, y_test, threshold=0.1
)

print("Test set performance:")
log_reg_model_test_perf_SFS_tre
Test set performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.895 0.84 0.485549 0.615385

The Recall of the train and test sets are highly mismatched, where recall train = 0.8894 and recall test = 0.64

Logistic Regression Models Comparison:

Sr Model name Train_performance recall(train). Test_performance. recall(test). Threshold Var

  1. Model name -lg. Recall(train) 0.6894. Recall(test) 0.6500. Threshold - 0.50 No of Variables - 257

  2. Model name -lg Recall(train) 0.9131 Recall(test) 0.8700 Threshold - 0.10 No of Variables - 257

  3. Model name -log_sfs Recall(train) 0.7052 Recall(test) 0.6400 Threshold - 0.50 No of Variables - 18

  4. Model name -log_sfs Recall(train) 0.8894 Recall(test) 0.6400 Threshold - 0.10 No of Variables - 18

Observation: The best performing model is Model_2 lg model tuned to a threshold of 0.10 and uses 257 variables

Conclusion:

  1. Low threshold yields a good model predictibility and its explained by the low percentage of class 1 customers (who accept personal loan) in the original data set.

This model will be used in the final comparison between Decision tree models

Model Building¶

2. Decision Tree¶

In addition to using: get_recall_score and make_confusion_matrix functions, we will define the function plot_tree to plot the decision trees for the models under study

In [ ]:
#function to plot a decision tree
def plot_tree(model, Predictor):
    feature_names = Predictor.columns.to_list()
    plt.figure(figsize=(20, 30))
    out = tree.plot_tree(
        model,
        feature_names=feature_names,
        filled=True,
        fontsize=9,
        node_ids=False,
        class_names=None,
    )
    # below code will add arrows to the decision tree split if they are missing
    for o in out:
        arrow = o.arrow_patch
        if arrow is not None:
            arrow.set_edgecolor("black")
            arrow.set_linewidth(1)
    plt.show()

Building the original tree T_o

The starting point is building the original full tree model with the default hyperparameters and observe:

Model performance

Variables importance

Model Improvement strategy

Yet, It was noted that the frequence of classes in training set is:

Class Frequence

0 - 0.905429

1 - 0.094571

In [ ]:
#creating the decission tree model
t_0 = DecisionTreeClassifier(criterion="gini", class_weight={0: 0.094571, 1: 0.905429}, random_state=1)
In [ ]:
#fitting the training data
t_0.fit(X_train, y_train)
Out[ ]:
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429}, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429}, random_state=1)

Model performance Evaluation of T_o

In [ ]:
#Calculating the Recall for train and test data
Recall_Train_T_0 = get_recall_score(t_0, X_train, y_train)
print(f'Recall for T_0 on Train Data = {Recall_Train_T_0}')
Recall_Test_T_0 = get_recall_score(t_0, X_test, y_test)
print(f'Recall for T_0 on Test Data = {Recall_Test_T_0}')
Recall for T_0 on Train Data = 1.0
Recall for T_0 on Test Data = 0.85

Mismatch is observed between train and test sets performance, it is assumed that the T_0 model is overfitting the data. let us observe further the confusion matrix and the tree structure.

In [ ]:
confusion_matrix_sklearn(t_0, X_train, y_train)
No description has been provided for this image

As assumed, the model is perfectly overfitting the data as observed from the confusion matric the FN & FP are 0%

In [ ]:
#plotting the tree
plot_tree(t_0,X_train)
No description has been provided for this image

Observation

The original tree T_o is seen to be complicated and is overfitting into the training data set, pre-prunning and post pruning are to be considered to improve the model performance.

The first split on the T_o was on the "Income" variable, let us observe first how this variable importances look like

In [ ]:
# importance of features in the tree building (The importance of a feature is computed as the
# (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance )

def view_nd_plot_importance(model, predictors):
    print("The features importances:")
    print(
        pd.DataFrame(
            model.feature_importances_, columns=["Imp"], index=predictors.columns
        ).sort_values(by="Imp", ascending=False)
    )

    importances = model.feature_importances_
    indices = np.argsort(importances)

    plt.figure(figsize=(12, 65))
    plt.title("Feature Importances")
    plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
    plt.yticks(range(len(indices)), [predictors.columns[i] for i in indices])
    plt.xlabel("Relative Importance")
    plt.show()
In [ ]:
view_nd_plot_importance(t_0, X_train)
The features importances:
                          Imp
Income               0.635549
Education_2          0.143484
CCAvg                0.085613
Education_3          0.048952
Family               0.041663
...                       ...
City_Torrance        0.000000
City_Trinity Center  0.000000
City_Tustin          0.000000
City_Ukiah           0.000000
State_CA             0.000000

[257 rows x 1 columns]
No description has been provided for this image

It is observed that the top importance variable is Income and the least importance are the cities (except for Los Angles that contributes slightly to the model predictibility) The top 5 variables are :

Income

Education_2

CCAvg

Education_3

Family

Model performance improvement¶

1. Pre-prunning¶

In [ ]:
#let us get the max depth of T_0 to have an idea how to tune our parameters
print(f'''The max depth of the t_0 =  {t_0.tree_.max_depth}
The node_count ={t_0.tree_.node_count}
The number of leaves = {t_0.tree_.n_leaves}''')
The max depth of the t_0 =  20
The node_count =179
The number of leaves = 90
In [ ]:
# Choose the type of classifier.
t_grid = DecisionTreeClassifier(random_state=1, class_weight={0: 0.094571, 1: 0.905429})

# Grid of parameters to choose from
parameters = {
    "max_depth": [5, 10, 15, 20, None],
    "criterion": ["entropy", "gini"],
    "splitter": ["best", "random"],
    'min_samples_leaf': [1, 2, 5, 7, 10,15,20],
    'max_leaf_nodes' : [2, 3, 5, 10],
    "min_impurity_decrease": [0.00001, 0.0001, 0.01],
}

# Type of scoring used to compare parameter combinations
scorer = make_scorer(recall_score)

# Run the grid search
grid_obj = GridSearchCV(t_grid, parameters, scoring=scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)

# Set the clf to the best combination of parameters
t_grid = grid_obj.best_estimator_

# Fit the best algorithm to the data.
t_grid.fit(X_train, y_train)
Out[ ]:
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
                       criterion='entropy', max_depth=5, max_leaf_nodes=3,
                       min_impurity_decrease=1e-05, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
                       criterion='entropy', max_depth=5, max_leaf_nodes=3,
                       min_impurity_decrease=1e-05, random_state=1)

Model Performance Evaluation_Pre-pruned Tree T_1

In [ ]:
# Choose the type of classifier.
t_1 = DecisionTreeClassifier(random_state=1,max_depth=5, criterion='entropy'
                             , class_weight={0: 0.094571, 1: 0.905429})
In [ ]:
t_1.fit(X_train,y_train)
Out[ ]:
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
                       criterion='entropy', max_depth=5, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
                       criterion='entropy', max_depth=5, random_state=1)

Model Performance Evaluation_Pre-pruned Tree T_1

In [ ]:
Recall_Train_T_1 = get_recall_score(t_1, X_train, y_train)
print(f'Recall for T_1 on Train Data = {Recall_Train_T_1}')
Recall_Test_T_1 = get_recall_score(t_1, X_test, y_test)
print(f'Recall for T_1 on Test Data = {Recall_Test_T_1}')
Recall for T_1 on Train Data = 0.9921052631578947
Recall for T_1 on Test Data = 0.92

Displaying the tree, the confucion matrix and variables importance

In [ ]:
confusion_matrix_sklearn(t_1, X_train, y_train)
plot_tree(t_1,X_train)
view_nd_plot_importance(t_1, X_train)
No description has been provided for this image
No description has been provided for this image
The features importances:
                         Imp
Income              0.620903
Education_2         0.137543
CCAvg               0.116768
Family              0.057203
Education_3         0.054307
...                      ...
City_Hermosa Beach  0.000000
City_Highland       0.000000
City_Hollister      0.000000
City_Hopland        0.000000
City_Fresno         0.000000

[257 rows x 1 columns]
No description has been provided for this image

Observation

At max_depth=5. criterion= entropy and default values for the remaining hyperparameters the model performance on the test set is better than at at max_depth= None criterion = gini

Recall values: Recall for T_1 on Train Data = 0.9921052631578947

Recall for T_1 on Test Data = 0.92000000000000000

Features with max importance: Income 6.20903e-01

Education_2 1.37543-01

Less importance yet still having a predictibily effect: CCAvg 1.16768-01

Family 5.7203e-02

Education_3 5.4307-02

Confusion matrix: -

FN at 0.07% -

FP at 3.95%

Tuning further hyperparameters to derive model T_2

In [ ]:
#Choose the type of classifier.
t_2 = DecisionTreeClassifier(random_state=1,max_depth=5, criterion='entropy'
                             , class_weight={0: 0.094571, 1: 0.905429},max_leaf_nodes=3,
                       min_impurity_decrease=1e-05)
In [ ]:
t_2.fit(X_train,y_train)
Out[ ]:
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
                       criterion='entropy', max_depth=5, max_leaf_nodes=3,
                       min_impurity_decrease=1e-05, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
                       criterion='entropy', max_depth=5, max_leaf_nodes=3,
                       min_impurity_decrease=1e-05, random_state=1)

Model Performance Evaluation_Pre-pruned Tree T_2

In [ ]:
Recall_Train_T_2 = get_recall_score(t_2, X_train, y_train)
print(f'Recall for T_2 on Train Data = {Recall_Train_T_1}')
Recall_Test_T_1 = get_recall_score(t_2, X_test, y_test)
print(f'Recall for T_2 on Test Data = {Recall_Test_T_1}')
Recall for T_2 on Train Data = 0.9921052631578947
Recall for T_2 on Test Data = 1.0

The feature importance of this model is:

Features with max importance:

Income 6.221524e-01

Education_2 1.290525e-01

Less importance yet still having a predictibily effect:

CCAvg 1.221962e-01

Family 5.805141e-02

Education_3 5.305620e-02

Post Prunning¶

In [ ]:
#defining the classifier
PP_t_0 = DecisionTreeClassifier(random_state=1, class_weight={0: 0.094571, 1: 0.905429})

#defining the cost complexity pryning path
path = PP_t_0.cost_complexity_pruning_path(X_train, y_train)

#Extracting the ccp_aplhas and impurities from the path
ccp_alphas, impurities = path.ccp_alphas, path.impurities
In [ ]:
#displaying the ccp_alphas VS the impurities to prove that as the alphase increas the impurities increase
pd.DataFrame(path)
Out[ ]:
ccp_alphas impurities
0 0.000000e+00 -1.221160e-14
1 1.346069e-18 -1.221025e-14
2 1.346069e-18 -1.220891e-14
3 1.927326e-18 -1.220698e-14
4 2.141473e-18 -1.220484e-14
5 2.692138e-18 -1.220215e-14
6 3.212210e-18 -1.219894e-14
7 4.129984e-18 -1.219481e-14
8 7.648118e-18 -1.218716e-14
9 7.648118e-18 -1.217951e-14
10 7.908154e-18 -1.217160e-14
11 2.489602e-17 -1.214670e-14
12 1.077161e-16 -1.203899e-14
13 1.265304e-16 -1.191246e-14
14 1.364192e-04 5.456769e-04
15 1.366781e-04 8.190331e-04
16 2.521427e-04 2.836175e-03
17 2.530267e-04 5.113415e-03
18 2.555348e-04 5.880019e-03
19 2.618762e-04 6.665648e-03
20 2.649322e-04 7.725377e-03
21 2.662816e-04 7.991658e-03
22 2.715014e-04 8.263160e-03
23 3.440114e-04 9.639205e-03
24 4.585411e-04 1.055629e-02
25 4.821981e-04 1.103849e-02
26 5.146570e-04 1.155314e-02
27 5.704040e-04 1.269395e-02
28 5.918556e-04 1.387766e-02
29 6.537767e-04 1.453144e-02
30 8.471994e-04 1.707304e-02
31 1.030111e-03 1.810315e-02
32 1.030872e-03 1.913402e-02
33 1.139805e-03 2.027382e-02
34 1.563743e-03 2.183757e-02
35 1.567932e-03 2.497343e-02
36 1.647632e-03 2.662106e-02
37 2.012975e-03 3.064701e-02
38 2.551760e-03 3.575053e-02
39 2.683713e-03 3.843425e-02
40 2.819703e-03 4.407365e-02
41 3.163018e-03 4.723667e-02
42 3.251437e-03 5.373954e-02
43 4.793461e-03 6.811993e-02
44 2.088058e-02 8.900051e-02
45 3.825469e-02 2.037646e-01
46 2.962323e-01 4.999969e-01
In [ ]:
#let us plot the alphas VS the impurities
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
No description has been provided for this image

Observation:

  1. It is observed that the impurities show a sdeen peak after alpha = approx.0.005

  2. Now train the decision tree using the effective alphas and observe how the tree depth vary with alpha

In [ ]:
#an emplty list of post pruned trees (PP_trees)

PP_trees = []
for alpha in ccp_alphas:
    PP_tree = DecisionTreeClassifier(
        random_state=1, ccp_alpha=alpha, class_weight={0: 0.094571, 1: 0.905429}
    )
    PP_tree.fit(X_train, y_train)
    PP_trees.append(PP_tree)
print(
    "Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
        PP_trees[-1].tree_.node_count, ccp_alphas[-1]
    )
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.2962323013249329

Action:

Remove the last elements in PP_trees and CCP_alphas as they relfect the smallest tree (one node) and proceed with the visualtization of the nodes and depth (i.e tree complexity) as alpha varies

In [ ]:
PP_trees = PP_trees[:-1]
ccp_alphas = ccp_alphas[:-1]
In [ ]:
node_counts = [PP_tree.tree_.node_count for PP_tree in PP_trees]
depth = [PP_tree.tree_.max_depth for PP_tree in PP_trees]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
No description has been provided for this image

***Observation**

  1. At alpha = 0.005, the tree have reached the smallest size which is underfitting the data.

  2. We now have an idea that the alpha value that will give us the optimum model performance is below 0.05

  3. Observe how the Model recall values will vary based on varying alphas for the training and test sets

In [ ]:
#derive the recall values for all PP_tress for the train set
recall_train = []
for PP_tree in PP_trees:
    y_pred_train = PP_tree.predict(X_train)
    values_train = recall_score(y_train, y_pred_train)
    recall_train.append(values_train)
In [ ]:
#derive the recall values for all PP_tress for the test set
recall_test = []
for PP_tree in PP_trees:
    y_pred_test = PP_tree.predict(X_test)
    values_test = recall_score(y_test, y_pred_test)
    recall_test.append(values_test)
In [ ]:
#calculating the Accuracy of test and train models
train_scores = [PP_tree.score(X_train, y_train) for PP_tree in PP_trees]
test_scores = [PP_tree.score(X_test, y_test) for PP_tree in PP_trees]
In [ ]:
#plotting the accuracy for test and training sets
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Accuracy")
ax.set_title("Accuracy vs alpha for training and testing sets")
ax.plot(
    ccp_alphas, train_scores, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, test_scores, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
No description has been provided for this image

Observation

  1. Observation better performance at alpha less than 0.05

  2. The alpha is giving the best value close to zero which could be still reflecting an over fitting tree

  3. The other option is at approx 0.03 or 0.035. Still, the accuracy is not the optimum performance measurement we need, so we must find the best Recall value as a performance measurement

In [ ]:
#plotting the recall scores for test and training sets VS Alpha
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(
    ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
No description has been provided for this image
In [ ]:
#let us derive the minimum alpha value for the test test
index_best_model = np.argmax(recall_test)
print(f''' The recall value giving the best predictibility model is: {recall_test[index_best_model]}
"The best alpha value is: {ccp_alphas[index_best_model]}"''')
 The recall value giving the best predictibility model is: 0.98
"The best alpha value is: 0.002683713283947349"

Deriving the best model PP_t_best and fitting it to the train set

In [ ]:
PP_t_best_1 = PP_trees[index_best_model]
PP_t_best_1.fit(X_train, y_train)
Out[ ]:
DecisionTreeClassifier(ccp_alpha=np.float64(0.002683713283947349),
                       class_weight={0: 0.094571, 1: 0.905429}, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(ccp_alpha=np.float64(0.002683713283947349),
                       class_weight={0: 0.094571, 1: 0.905429}, random_state=1)

Model Performance Evaluation_Post-pruned Tree¶

In [ ]:
#Calculating the Recall for train and test data
Recall_Train_PP_t_best_1 = get_recall_score(PP_t_best_1, X_train, y_train)
print(f'Recall for PP_t_best_1 on Train Data = {Recall_Train_PP_t_best_1}')

Recall_Test_PP_t_best_1 = get_recall_score(PP_t_best_1, X_test, y_test)
print(f'Recall for PP_t_best_1 on Test Data = {Recall_Test_PP_t_best_1}')
Recall for PP_t_best_1 on Train Data = 1.0
Recall for PP_t_best_1 on Test Data = 0.98

Observation

  1. The Recall on train data sets is 1.0

  2. Since its value is closer to the recall of the test dataset which is still better than t_0

  3. It would require further improvement

  4. Derive the tree, decision table and feature importantce for this model PP_t_best_1

  5. Then derive another model of the second peak of alpha.

In [ ]:
#plotting the tree, variables importance and the confusion matrix
print(f'The tree depth is : {PP_t_best_1.tree_.max_depth}')
confusion_matrix_sklearn(PP_t_best_1, X_train, y_train)
plot_tree(PP_t_best_1,X_train)
view_nd_plot_importance(PP_t_best_1, X_train)
The tree depth is : 6
No description has been provided for this image
No description has been provided for this image
The features importances:
                        Imp
Income             0.671525
Education_2        0.155432
CCAvg              0.079833
Education_3        0.052849
Family             0.040362
...                     ...
City_Garden Grove  0.000000
City_Fullerton     0.000000
City_Fresno        0.000000
City_Fremont       0.000000
City_Greenbrae     0.000000

[257 rows x 1 columns]
No description has been provided for this image

Observation on PP_t_best_1 Model

  1. At ccp_alpha=0.002683713283947349 and max_depth=6 the performance summary is:

a. Recall values:

i. Recall for PP_t_best_1 on Train Data = 1.0

ii. Recall for PP_t_best_1 on Test Data = 0.98

b. Features with max importance: (a) Income (b) Education_2

i. Less importance yet still having a predictibily effect: (a) CCAvg (b) Education_3 (c)Family

c. Confusion matrix:

i. FN at 0.00%

ii. FP at 4.62%

In [ ]:
#creating a data frame including alpha, recall train and recall test
df = pd.DataFrame()
df[['ccp_alphas', 'recall_train', 'recall_test']]=''
df['ccp_alphas']=ccp_alphas
df['recall_train']= recall_train
df['recall_test']=recall_test
df
Out[ ]:
ccp_alphas recall_train recall_test
0 0.000000e+00 1.000000 0.85
1 1.346069e-18 1.000000 0.85
2 1.346069e-18 1.000000 0.85
3 1.927326e-18 1.000000 0.85
4 2.141473e-18 1.000000 0.85
5 2.692138e-18 1.000000 0.85
6 3.212210e-18 1.000000 0.85
7 4.129984e-18 1.000000 0.85
8 7.648118e-18 1.000000 0.85
9 7.648118e-18 1.000000 0.85
10 7.908154e-18 1.000000 0.85
11 2.489602e-17 1.000000 0.85
12 1.077161e-16 1.000000 0.85
13 1.265304e-16 1.000000 0.85
14 1.364192e-04 1.000000 0.85
15 1.366781e-04 1.000000 0.85
16 2.521427e-04 1.000000 0.85
17 2.530267e-04 1.000000 0.85
18 2.555348e-04 1.000000 0.87
19 2.618762e-04 1.000000 0.88
20 2.649322e-04 1.000000 0.88
21 2.662816e-04 1.000000 0.88
22 2.715014e-04 1.000000 0.88
23 3.440114e-04 1.000000 0.89
24 4.585411e-04 1.000000 0.89
25 4.821981e-04 1.000000 0.89
26 5.146570e-04 1.000000 0.89
27 5.704040e-04 1.000000 0.89
28 5.918556e-04 1.000000 0.90
29 6.537767e-04 1.000000 0.90
30 8.471994e-04 1.000000 0.92
31 1.030111e-03 1.000000 0.92
32 1.030872e-03 1.000000 0.92
33 1.139805e-03 1.000000 0.93
34 1.563743e-03 1.000000 0.93
35 1.567932e-03 1.000000 0.94
36 1.647632e-03 1.000000 0.94
37 2.012975e-03 1.000000 0.94
38 2.551760e-03 1.000000 0.95
39 2.683713e-03 1.000000 0.98
40 2.819703e-03 0.992105 0.96
41 3.163018e-03 0.992105 0.97
42 3.251437e-03 0.992105 0.97
43 4.793461e-03 0.992105 0.97
44 2.088058e-02 0.950000 0.88
45 3.825469e-02 0.957895 0.91

At row 42 above, it is observed that recall values for train and test at a value of alpha near the 0.003 threshold we had seen earlier on the curve.

This value seems a good fit as it is not too close to an underfitting model, hence we will extract this value and observe how it behaves on the tree and confusion matrix

In [ ]:
df.iloc[42]
Out[ ]:
42
ccp_alphas 0.003251
recall_train 0.992105
recall_test 0.970000

In [ ]:
#extracting the model from the 42nd model and fitting it to train and test data
PP_t_best_2 = PP_trees[42]
PP_t_best_2.fit(X_train, y_train)
Out[ ]:
DecisionTreeClassifier(ccp_alpha=np.float64(0.003251436542468189),
                       class_weight={0: 0.094571, 1: 0.905429}, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(ccp_alpha=np.float64(0.003251436542468189),
                       class_weight={0: 0.094571, 1: 0.905429}, random_state=1)

Model Performance Evaluation of Post-pruned Tree PP_t_best_2¶

In [ ]:
#Calculating the Recall for train and test data
Recall_Train_PP_t_best_2 = get_recall_score(PP_t_best_2, X_train, y_train)
print(f'Recall for PP_t_best_2 on Train Data = {Recall_Train_PP_t_best_2}')

Recall_Test_PP_t_best_2 = get_recall_score(PP_t_best_2, X_test, y_test)
print(f'Recall for PP_t_best_2 on Test Data = {get_recall_score(PP_t_best_2, X_test, y_test)}')
Recall for PP_t_best_2 on Train Data = 0.9921052631578947
Recall for PP_t_best_2 on Test Data = 0.97

Observation

  1. The Recall on test and train data sets are almost the same as stated in the dataframe which was expected

  2. Now visualizing the tree, the confusion matrix and feature importance

In [ ]:
#plotting the tree, variables importance and the confusion matrix
print(f'The tree depth is : {PP_t_best_2.tree_.max_depth}')
confusion_matrix_sklearn(PP_t_best_2, X_train, y_train)
plot_tree(PP_t_best_2,X_train)
view_nd_plot_importance(PP_t_best_2, X_train)
The tree depth is : 5
No description has been provided for this image
No description has been provided for this image
The features importances:
                        Imp
Income             0.684178
Education_2        0.160763
CCAvg              0.058652
Education_3        0.054661
Family             0.041746
...                     ...
City_Garden Grove  0.000000
City_Fullerton     0.000000
City_Fresno        0.000000
City_Fremont       0.000000
City_Greenbrae     0.000000

[257 rows x 1 columns]
No description has been provided for this image

Observation on PP_t_best_2 Model

  1. At ccp_alpha=0.003504 and max_depth=6 the performance of PP_t_best_2 is better then PP_t_best_1. Although the recall values for train and test are less, the FN% in the confusion matrix is down to half. The performance summary is:

a. Recall values:

i. Recall for PP_t_best_2 on Train Data = 0.9921052631578947

ii. Recall for PP_t_best_2 on Test Data = 0.97

b. Features with max importance:

i. Income 0.684178

ii. Education_2 0.160763

c. Less importance yet still having a predictibily effect:

i. CCAvg 0.058652

ii. Education_3 0.054661

iii. Family 0.041746

Confusion matrix:

FN at 0.07%

FP at 5.88%

Model Performance Evaluation_Post-pruned Tree PP_t_best_3¶

In [ ]:
df.iloc[42]
Out[ ]:
42
ccp_alphas 0.003251
recall_train 0.992105
recall_test 0.970000

In [ ]:
#extracting the model from the 42nd model and fitting it to train and test data
PP_t_best_3 = PP_trees[42]
PP_t_best_3.fit(X_train, y_train)
Out[ ]:
DecisionTreeClassifier(ccp_alpha=np.float64(0.003251436542468189),
                       class_weight={0: 0.094571, 1: 0.905429}, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(ccp_alpha=np.float64(0.003251436542468189),
                       class_weight={0: 0.094571, 1: 0.905429}, random_state=1)
In [ ]:
#Calculating the Recall for train and test data
Recall_Train_PP_t_best_3 = get_recall_score(PP_t_best_3, X_train, y_train)
print(f'Recall for PP_t_best_3 on Train Data = {Recall_Train_PP_t_best_3}')
Recall_Test_PP_t_best_3 = get_recall_score(PP_t_best_3, X_test, y_test)
print(f'Recall for PP_t_best_3 on Test Data = {Recall_Test_PP_t_best_3}')
Recall for PP_t_best_3 on Train Data = 0.9921052631578947
Recall for PP_t_best_3 on Test Data = 0.97

The Recall on test and train data sets are looks good

Visualizing the tree, the confusion matrix and feature importance

In [ ]:
#plotting the tree, variables importance and the confusion matrix
print(f'The tree depth is : {PP_t_best_3.tree_.max_depth}')
confusion_matrix_sklearn(PP_t_best_3, X_train, y_train)
plot_tree(PP_t_best_3,X_train)
view_nd_plot_importance(PP_t_best_3, X_train)
The tree depth is : 5
No description has been provided for this image
No description has been provided for this image
The features importances:
                        Imp
Income             0.684178
Education_2        0.160763
CCAvg              0.058652
Education_3        0.054661
Family             0.041746
...                     ...
City_Garden Grove  0.000000
City_Fullerton     0.000000
City_Fresno        0.000000
City_Fremont       0.000000
City_Greenbrae     0.000000

[257 rows x 1 columns]
No description has been provided for this image

Observation on PP_t_best_3 Model

At ccp_alpha=0.003251 max_depth=5 the performance of PP_t_best_3 is better then PP_t_best_2.

The performance summary is:

Recall values:

Recall for PP_t_best_3 on Train Data = 0.9921052631578947

Recall for PP_t_best_3 on Test Data = 0.97

Features with max importance:

Income 0.684178

Education_2 0.160763

Less importance yet still having a predictibily effect:

CCAvg 0.058652

Education_3 0.054661

Family 0.041746

Confusion matrix:

FN at 0.07%

FP at 5.88%

Decision Tree Conclusion

  1. The third and final model has a low tree depth of 5, hence less tree complexity of the tree using a greater value of alpha and it was successful in avoiding the model overfitting. Hence, The best performing model on the test set is PP_t_best_3 with the below alpha value

At ccp_alpha=0.003251 max_depth=5 the performance of PP_t_best_3 is better then PP_t_best_2.

The performance summary is:

Recall values:

Recall for PP_t_best_3 on Train Data = 0.9921052631578947

Recall for PP_t_best_3 on Test Data = 0.97

Features with max importance:

Income 0.684178

Education_2 0.160763

Less importance yet still having a predictibily effect:

CCAvg 0.058652

Education_3 0.054661

Family 0.041746

Confusion matrix:

FN at 0.07%

FP at 5.88%

Model Comparison and Final Model Selection¶

Observation and conclusion:

At threshold 0.10 the Recall for the test and train data sets is equal to 87% and 91% respectively which is a very good performance by the logistics regression model to minimize the FN on our model to only 1.20% on the test set whilst maintaing a precision value of approximately 50% for both test and train datasets.

In [ ]:
 

Actionable Insights and Business Recommendations¶

Final models comparison:

Modelling Algorithm Model Name

recall (train)

recall (test)

Logistic Regression

lg with threshold 0.1 0.92 0.88

Decision Tree

(Pre Pruned) t_1 0.99 0.95

(Post Pruned) PP_t_best_3 0.99 0.97

Insights:

The best performing model was derived from the Decision Tree Modelling technique where the original tree was post pruned via ccp_alpha=0.0.003251 and gave the below Recall values for test and training data sets:

Recall for PP_t_best_3 on Train Data = 0.9921052631578947 and Recall for PP_t_best_3 on Test Data = 0.97

The statistical evidence show the Features that most affects the client decision to accept a personal loan are listed in below table with priority levels;

Priority Feature Effect on customer

  1. Income- The higher the income, the more chances are that the customer will accept a personal loan

  2. Education_2- Customers with Education level 2 are more willing to accept a personal loan than levels 1 and 3

  3. CCAvg- As the average monthly spending of customers increases, the more the customers are are willing to accept personal loan

  4. Education_3- Customers with Education level 3 are more willing to accept a personal loan than level 1

  5. Family- As familly size grows, customers are more willing to accept personal loan

What recommedations would you suggest to the bank?¶

  1. The marketing department should study customers profiles first before approaching them for a personal loan offer.

  2. AllLife bank should apply various strategies to sell more Personal Loan packages especially by using dedicated relationship managers for high profile customers

  3. The bank should also consider monthly/quarterly follow-up with average to mid profile customers in order to see how to attract more customers to take personal loan

  4. The bank should consider doing a rigorous mail marketting directed to high and mid profile customers so that they will be properly informed about getting pre-approved for personal loans

  5. Income is also seen at the most important feature in decision tree model. So If our customer's yearly income is less than USD92,500, its most likely the customer wouldn't accept a personal loan

  6. Consequently, customers with an income greater than USD92,500 and with an education level greater than or equal to 3 (Advanced/Professional) were most likely to have a personal loan. So a targetted marketting to this group is very important.

  7. Finally, customers using online banking services were more likely to have personal loans. Therefore, the bank should improve on the websites and make it more user-friendly and encourage those customers who don't use the facilities to use the online banking services

  8. Making personal loan application process on the website available on mobile device and reducing online applications to few clicks would be helpful in order to improve customers' user experience when applying for personal loan.

  9. The location of residence of customers does not really have any real impact on their decision whether to accept personal loans or not as long as the the bank can do the things mentioned in points 1-8 above.

In [ ]:
!pip install nbconvert
Requirement already satisfied: nbconvert in /usr/local/lib/python3.12/dist-packages (7.16.6)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (4.13.5)
Requirement already satisfied: bleach!=5.0.0 in /usr/local/lib/python3.12/dist-packages (from bleach[css]!=5.0.0->nbconvert) (6.3.0)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.7.1)
Requirement already satisfied: jinja2>=3.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.1.6)
Requirement already satisfied: jupyter-core>=4.7 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.9.1)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.3.0)
Requirement already satisfied: markupsafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.0.3)
Requirement already satisfied: mistune<4,>=2.0.3 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.1.4)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.10.2)
Requirement already satisfied: nbformat>=5.7 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.10.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from nbconvert) (25.0)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (1.5.1)
Requirement already satisfied: pygments>=2.4.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (2.19.2)
Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.7.1)
Requirement already satisfied: webencodings in /usr/local/lib/python3.12/dist-packages (from bleach!=5.0.0->bleach[css]!=5.0.0->nbconvert) (0.5.1)
Requirement already satisfied: tinycss2<1.5,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from bleach[css]!=5.0.0->nbconvert) (1.4.0)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.12/dist-packages (from jupyter-core>=4.7->nbconvert) (4.5.0)
Requirement already satisfied: jupyter-client>=6.1.12 in /usr/local/lib/python3.12/dist-packages (from nbclient>=0.5.0->nbconvert) (7.4.9)
Requirement already satisfied: fastjsonschema>=2.15 in /usr/local/lib/python3.12/dist-packages (from nbformat>=5.7->nbconvert) (2.21.2)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.12/dist-packages (from nbformat>=5.7->nbconvert) (4.25.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.12/dist-packages (from beautifulsoup4->nbconvert) (2.8)
Requirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.12/dist-packages (from beautifulsoup4->nbconvert) (4.15.0)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (25.4.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (2025.9.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (0.37.0)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (0.28.0)
Requirement already satisfied: entrypoints in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (0.4)
Requirement already satisfied: nest-asyncio>=1.5.4 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (1.6.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (2.9.0.post0)
Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (26.2.1)
Requirement already satisfied: tornado>=6.2 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (6.5.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (1.17.0)
In [163]:
%%shell
jupyter nbconvert --to html '/content/drive/My Drive/Machine_Learning_SL_Full_Code_Akomolafe_Samson_Updated_a.ipynb'
[NbConvertApp] WARNING | pattern '/content/drive/My Drive/Machine_Learning_SL_Full_Code_Akomolafe_Samson_Updated_a.ipynb' matched no files
This application is used to convert notebook files (*.ipynb)
        to various other formats.

        WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.

Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
    Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
    Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
    read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
    Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
    Write notebook output to stdout instead of files.
    Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
    Run nbconvert in place, overwriting the existing notebook (only
            relevant when converting to notebook format)
    Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
    Clear output of current file and save in place,
            overwriting the existing notebook.
    Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--coalesce-streams
    Coalesce consecutive stdout and stderr outputs into one stream (within each cell).
    Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --CoalesceStreamsPreprocessor.enabled=True]
--no-prompt
    Exclude input and output prompts from converted document.
    Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
    Exclude input cells and output prompts from converted document.
            This mode is ideal for generating code-free reports.
    Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True --TemplateExporter.exclude_input_prompt=True]
--allow-chromium-download
    Whether to allow downloading chromium if no suitable version is found on the system.
    Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--disable-chromium-sandbox
    Disable chromium security sandbox when converting to PDF..
    Equivalent to: [--WebPDFExporter.disable_sandbox=True]
--show-input
    Shows code input. This flag is only useful for dejavu users.
    Equivalent to: [--TemplateExporter.exclude_input=False]
--embed-images
    Embed the images as base64 dataurls in the output. This flag is only useful for the HTML/WebPDF/Slides exports.
    Equivalent to: [--HTMLExporter.embed_images=True]
--sanitize-html
    Whether the HTML in Markdown cells and cell outputs should be sanitized..
    Equivalent to: [--HTMLExporter.sanitize_html=True]
--log-level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
    Equivalent to: [--Application.log_level]
--config=<Unicode>
    Full path of a config file.
    Default: ''
    Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
    The export format to be used, either one of the built-in formats
            ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf']
            or a dotted object name that represents the import path for an
            ``Exporter`` class
    Default: ''
    Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
    Name of the template to use
    Default: ''
    Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
    Name of the template file to use
    Default: None
    Equivalent to: [--TemplateExporter.template_file]
--theme=<Unicode>
    Template specific theme(e.g. the name of a JupyterLab CSS theme distributed
    as prebuilt extension for the lab template)
    Default: 'light'
    Equivalent to: [--HTMLExporter.theme]
--sanitize_html=<Bool>
    Whether the HTML in Markdown cells and cell outputs should be sanitized.This
    should be set to True by nbviewer or similar tools.
    Default: False
    Equivalent to: [--HTMLExporter.sanitize_html]
--writer=<DottedObjectName>
    Writer class used to write the
                                        results of the conversion
    Default: 'FilesWriter'
    Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
    PostProcessor class used to write the
                                        results of the conversion
    Default: ''
    Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
    Overwrite base name use for output files.
                Supports pattern replacements '{notebook_name}'.
    Default: '{notebook_name}'
    Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
    Directory to write output(s) to. Defaults
                                  to output to the directory of each notebook. To recover
                                  previous default behaviour (outputting to the current
                                  working directory) use . as the flag value.
    Default: ''
    Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
    The URL prefix for reveal.js (version 3.x).
            This defaults to the reveal CDN, but can be any url pointing to a copy
            of reveal.js.
            For speaker notes to work, this must be a relative path to a local
            copy of reveal.js: e.g., "reveal.js".
            If a relative path is given, it must be a subdirectory of the
            current directory (from which the server is run).
            See the usage documentation
            (https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-slideshow)
            for more details.
    Default: ''
    Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
    The nbformat version to write.
            Use this to downgrade notebooks.
    Choices: any of [1, 2, 3, 4]
    Default: 4
    Equivalent to: [--NotebookExporter.nbformat_version]

Examples
--------

    The simplest way to use nbconvert is

            > jupyter nbconvert mynotebook.ipynb --to html

            Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf'].

            > jupyter nbconvert --to latex mynotebook.ipynb

            Both HTML and LaTeX support multiple output templates. LaTeX includes
            'base', 'article' and 'report'.  HTML includes 'basic', 'lab' and
            'classic'. You can specify the flavor of the format used.

            > jupyter nbconvert --to html --template lab mynotebook.ipynb

            You can also pipe the output to stdout, rather than a file

            > jupyter nbconvert mynotebook.ipynb --stdout

            PDF is generated via latex

            > jupyter nbconvert mynotebook.ipynb --to pdf

            You can get (and serve) a Reveal.js-powered slideshow

            > jupyter nbconvert myslides.ipynb --to slides --post serve

            Multiple notebooks can be given at the command line in a couple of
            different ways:

            > jupyter nbconvert notebook*.ipynb
            > jupyter nbconvert notebook1.ipynb notebook2.ipynb

            or you can specify the notebooks list in a config file, containing::

                c.NbConvertApp.notebooks = ["my_notebook.ipynb"]

            > jupyter nbconvert --config mycfg.py

To see all available configurables, use `--help-all`.

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
/tmp/ipython-input-2027240590.py in <cell line: 0>()
----> 1 get_ipython().run_cell_magic('shell', '', "jupyter nbconvert --to html '/content/drive/My Drive/Machine_Learning_SL_Full_Code_Akomolafe_Samson_Updated_a.ipynb'\n")

/usr/local/lib/python3.12/dist-packages/google/colab/_shell.py in run_cell_magic(self, magic_name, line, cell)
    274     if line and not cell:
    275       cell = ' '
--> 276     return super().run_cell_magic(magic_name, line, cell)
    277 
    278 

/usr/local/lib/python3.12/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2471             with self.builtin_trap:
   2472                 args = (magic_arg_s, cell)
-> 2473                 result = fn(*args, **kwargs)
   2474             return result
   2475 

/usr/local/lib/python3.12/dist-packages/google/colab/_system_commands.py in _shell_cell_magic(args, cmd)
    110   result = _run_command(cmd, clear_streamed_output=False)
    111   if not parsed_args.ignore_errors:
--> 112     result.check_returncode()
    113   return result
    114 

/usr/local/lib/python3.12/dist-packages/google/colab/_system_commands.py in check_returncode(self)
    135   def check_returncode(self):
    136     if self.returncode:
--> 137       raise subprocess.CalledProcessError(
    138           returncode=self.returncode, cmd=self.args, output=self.output
    139       )

CalledProcessError: Command 'jupyter nbconvert --to html '/content/drive/My Drive/Machine_Learning_SL_Full_Code_Akomolafe_Samson_Updated_a.ipynb'
' returned non-zero exit status 255.